xml-dev - RE: Weak DTDs

RE: Weak DTDs
[ Lists Home | Date Index | Thread Index ]
From: akirkpatrick@ims-global.com
To: xml-dev@ic.ac.uk,peter@ursus.demon.co.uk
Date: Fri, 17 Oct 1997 11:04:25 +0000
The strength of the DTD is in giving a limited set of possibilities for
a processing engine to work with. There are obviously other ways
to do this (see below) but for a lot of applications, the DTD provides
sufficient constraints for authors of the information. A common
example is a title element. Often a title is required to provide feedback
in a UI, to act as link text in a hypertext link, etc. If your DTD says:

<!ELEMENT anything (title, anything.else+)>

then you know for a fact that you can pick out the title, given a
valid document. Also, the parser will tell you if the document is valid
or not and you can then decide whether to attempt processing it.
In our application, the RTF processing engine will still attempt to
process a document but says "hey, you might not get what you
expect". In other situations, an application just says "go away
and come back with something valid".

It sounds like in your situation, you aren't worried about the vast
majority of elements but just want to pick up on key things like
<atom>, <bond>, etc. The "Eliot" way to do this would be with an
architecture DTD which defines attributes to identify important
elements. Your derived DTD can then use any content model (or
even element names) you want.

For example:

<!element atom - - (bond+)>
<!attlist atom
 CMLNAME NAME  #FIXED atom>

<!element bond - - EMPTY>
<!attlist bond
 CMLNAME NAME  #FIXED bond>

Your derived DTD might then go something like:

<!ELEMENT myatom - - (title, mybond+, otherstuff)>
<!ATTLIST myatom
 CMLNAME NAME  #FIXED atom>
<!ELEMENT mybond - - (title, description)>
<!ATTLIST mybond
 CMLNAME NAME  #FIXED bond>

(I'm still new to AFs, but this is the basic idea)

Now your processing engine can identify items by their fixed
attributes and process according, ignoring all other elements.
Other people can happily derive from your architecture DTD to
add their application specific elements.

If you are using XML without a DTD, things are exactly the
same except that you need to explicitly set the attribute on
the relevant elements (as I understand it). It should be trivial
to write a normaliser which would generate XML from an SGML
instance (SGMLNORM would probably do it).

I think one of the major problems with the Web today is the
plethora of badly formed HTML pages which have been allowed
to grow and florish by browsers which don't check for validity
in any way at all. There is a danger that lack of DTDs in XML
documents will lead to even greater "tag soup".

 ----------
From:  peter@ursus.demon.co.uk
Sent:  17 October 1997 08:21
To:  xml-dev@ic.ac.uk
Subject:  Weak DTDs

 --------------------------------------------------------------------------  
 --
I am in the throes of revising CML (Chemical Markup Language - an   
XML-based
application) and trying to work out what the value of conventional DTDs
are. The previous version has a traditional SGML-like DTD - lots of
parameter entities and other clever stuff. I am finding this too
restrictive for several reasons, mainly because:
 (a) XML-* is moving so rapidly (e.g. LINK, STYLE, etc.) This is a Good
Thing, but CML has to react to it.
 (b) RDF, DC, MathML etc will be involved in CML and I can't say exactly
how at present.
 (c) My ideas on CML itself keep changing as I gain experience of new
problems.

I'd like *constructive* views on the value of DTDs in XML. [I know that   
the
community has strongly held ones, so please avoid too much passion :-).
There was a very interesting discussion a few weeks back on the   
aesthetics
of DTDs - a good DTD is a thing of beauty.] I can see the following   
reasons
for DTDs.
 (a) the author has to conform to a pre-defined spectrum of ideas (e.g. a
tax-return). [This is not required for CML, and any conformance is   
outside
what a DTD can deliver - e.g. value verification.]
 (b) the document may get corrupted in transmission or elsewhere. I   
suspect
this is not a very important reason these days.
 (c) it *may* make it easier to develop authoring tools
 (d) it *may* give guidance to implementers of applications.
  (e) it should (but doesn't always) act as an incentive to develop
human-readable documentation of the semantics.
 (f) it shows that the author has defined the language at some point in   
time.

I'd be grateful for other reasons for CML I expect that (c-e) have some
limited value. (f) may impress some people and horrify others.

In creating CML documents I find myself:
 (a) wanting to introduce foreign names (e.g. <DC:author>, or   
<MathML:EQN>)
These could reasonably come at many places in the document
 (b) forgetting my own 'rules', e.g. order of elements within a content
model. So I can't expect others to follow them :-)
 (c) adding new components to content models - for good reasons. There is
no reason why an <MOLECULE> cannot contain a <FIGURE>, but I didn't think
of that earlier. I don't want to have to think of all combinations and   
ask
'is that reasonable?'.
 
However the power of structured documents means that I can often use very
fuzzily constructed documents. Thus:
 'if a MOLECULE contains ATOMS and BONDS, the software can draw a   
picture'
 'if any parent contains a FIGURE, allow that to be displayed by the   
reader'.
 'if a VARiable has attribute BUILTIN=FOO, inform the software that it
could process this with special FOO-specific code'
and so on.

These are powerful conditions, but if we try to express them in DTDs,
validation will fail. What I'd like to have is a wildcard #ANY (this has
already been suggested) which can be used for content models something   
like
the (currently illegal) XML:

<!ELEMENT MOL (#ANY,ATOMS,BONDS)*>

This says that MOL can contain anything, but that ATOMS and BONDS have a
special role. The authoring tool might present a menu with the items   
ATOMS,
BONDS, Other. The software for MOL.java could contain routines to   
identify
children:
 for (int i = 0; i < this.getChildCount(); i++) {
            Node n = getNode(i);
            if (n instanceof ATOMS) {
                /* atom-specific stuff */;
                natom++;
            } else if (n instanceof BONDS) {
                /* bond-specific stuff */;
                nbond++;
            }
        }
        if (natom > 0 && nbond > 0) {
            displayMol();
        }

Obviously this can't be written automatically, but the 'DTD' helps the   
author.

In some cases there will be stricter rules such as:

<!ELEMENT VAR (PCDATA)>
<!ATTLIST VAR
    BUILTIN CDATA #IMPLIED
    TYPE (INTEGER,FLOAT,STRING) STRING ...>

which clearly help both authoring tool authors and applications authors.

At present I would like to keep a simple DTD but most of the content   
models
will be 'ANY' and most of the attribute values will be CDATA. It would be
nice to have attribute values which could take a list of values *and*   
CDATA
:-) - like:
<!ATTLIST VAR TYPE (INTEGER,FLOAT,STRING,#ANY)>
which would inform the software that it should cater for three specific
values, but that the user can add FOO if they really want.

Any sympathisers out there :-)?

 P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences,   
domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following   
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
Follow-Ups:
- RE: Weak DTDs
  - From: Peter Murray-Rust <peter@ursus.demon.co.uk>
Prev by Date: Re: Weak DTDs
Next by Date: Re: Weak DTDs
Previous by thread: Re: Weak DTDs
Next by thread: RE: Weak DTDs
Index(es):
- Date
- Thread