OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DTD's


I'm a newbie, but I have another view on the DTD issue --

I keep my work journal (the thing I fill out my daily and monthly reports
from) with a text editor. I have been using some XML marking in obvious
spots, as a kind of thought exercise. This sort of thing obviously isn't
ready for a DTD, as it isn't really even well-formed yet.

When I take this to the next step (My boss says I must, but he isn't
allocating me time yet.), I will use a RAD tool (Delphi, probably) to set up
a quick form that wraps the fields I know I'll need with XML tags and
appends it to a text file that will be well-formed when it is generated. At
this point, I don't really need a DTD, but I probably want to start thinking
about it, making scratch notes in the comments in the program code.

Until I build the first prototype, I won't know, but I think I eventually
want the ability to insert arbitrary tags in several of the fields,
particularly the notes field. (Yes, probably mixed content! Do these
arbitrary tags have anything to do with the concepts of the Semantic Web?)
If I've already built a DTD by this time, I may find that the DTD didn't
allow my arbitrary tags. But that would be no big deal, unless I've
published the DTD to the whole company, in which case changing it to allow
the arbitrary tags might cost the company money. So I probably want to
experiment with DTDs after the first prototype, but I don't want to jump the
gun and publish any corporate standards just yet.

Why do I want to build a DTD at this point? Mostly to help me understand the
document class that I am implicitly designing.

Now, some time down the road, I have an application that can print my daily
and monthly reports for me, straight out of my work journal. If it works
well, the boss will want me to make it available to the whole company. At
this point, I want a DTD.

The reason I want a DTD is that it allows the journal app to publish its
interface to other applications (payroll, for instance) to directly extract
data from the journal.

Just as important, it also provides a reference for other programmers, who
will want to to extend the journal application. With the DTD reference, we
can extend the journal application in an orderly manner, and the DTD can
warn us if anybody's version has become incompatible. Mostly, it helps us
keep the extensions from raising incompatibilities. In the case of
incompatibilities, it provides the programmer a target when writing the
(XSLT? DOM? SAX?) filters to transform the old documents to the new
format(s). Likewise, an employee who has his own journal keeping program can
use the DTD as a target for the program that transforms his journal into the
public form.

Summarizing the above, I think you should not use a DTD for any class of
documents whose use has not yet stabilized. Once the general use of a
document has stabilized enough to make an application for it, a DTD comes in
very handy for the programmers.

I think there are large classes of documents which simply will not be
expected to stabilize into a form that can benefit from a DTD, even though
they would benefit from XML.

Corporate policies may be one of these classes. (If there is anything
certain about rules and regulations, it's exceptions. True?) Having a DTD
may help, but any DTD for such applications must be subject to version
control. Version control technology is not quite ready for your average
secretary to operate. If that's not bad enough, maintaining the DTD current
with the present form of the document may incur significant cost to smaller
companies. So some companies might well prefer not bother with DTDs for
unstable classes of documents.

There is information lost if they so choose, and the value of the
information lost needs to be considered. If the documents become
inaccessible without the DTD, then it is probably best to have the DTD. If
they are readable without the DTD, it may well be not worth having the DTD.

An aside -- Japanese documents have a problem not so familiar to most
American programmers. No whitespace. It's sometimes impossible even for
humans to parse word boundaries. This makes normal search functions (the
kind of stuff you do with Perl and regular expressions) less effective for
Japanese documents. In English, we can easily search for any word that
starts with "fidu". With Japanese, it is not easy at all to search for any
word that starts with a character containing "kane" as a radical part. But
if we preprocess a document and tag the interesting parts, we might be able
to get around this problem.

(I have an idea for another way to get around the problem, but it would
require inventing a completely new encoding method for the Han characters.
Definitely not going to be implemented this year.)

Okay, bringing Japanese up was just a ruse to motivate the use of XML for
otherwise free-form documents. Free-form documents are like the post-cards
you send while you're traveling. These are not, in my opinion, amenable to
DTDs. Part of the meaning of the document simply can't be encoded, at least
not in UNICODE. But one might still want to bury XML tags in a post-card
style e-mail message. (Japanese brings this sort of document more into focus
for me.)

Any comments on this point of view?

----- Original Message -----
From: "Sandra Carney" <scarney@endocardial.com>
To: "XML Development" <xml-dev@lists.xml.org>
Sent: Friday, May 18, 2001 1:42 AM
Subject: DTD's

> Hi,
>   We have a question about the necessity of DTD.  There are folks
>   among our developers who postulate that so long as the document
>   is well-formed, we don't need DTD's.  So far, so true.  However,
>   might this pose a quality problem later on especially if you want
>   to limit what are considered legitimate tags in the document?
> Regards,
> Sandra Carney
> ------------------------------------------------------------------
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@lists.xml.org