xml-dev - Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9

Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
From: Norman Walsh <ndw@nwalsh.com>
Date: Wed, 12 Jun 2002 07:00:20 -0400
In-reply-to: <200206111731.NAA15897@mail.reutershealth.com> (John Cowan'smessage of "Tue, 11 Jun 2002 13:31:53 -0400 (EDT)")
References: <200206111731.NAA15897@mail.reutershealth.com>
Sender: Norman Walsh <ndw@mercury>
User-agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.1 (i686-pc-linux-gnu)

/ John Cowan <jcowan@reutershealth.com> was heard to say:
| I am assuming that the context for extending DTDs is not redefining
| XML, but rather creating an enhanced XML DTD format which can be used by an
| external validator.

Hmm. I imagined this work would redefine XML DTDs. If it's all for
post-parsing validation, why bother? We've got RNG, XSD, and
Schematron (and others).

(By the same token, if the result won't be XML, why bother, but ...)

| Example:
|
| <!NS foo SYSTEM "http://www.example.com/foo";>

I've always imagined this as:

 <!NAMESPACE foo "http://www.example.com/foo";>

There's no precedent for abbreviation in the decl name.

| Issue: Is it an error to mention a prefix that is not declared?  My
| answer: no; if this is done, name matching falls back to string identity.

If the DTD contains an <!NAMESPACE decl, yes, otherwise no.

| Issue: is the keyword SYSTEM useful?

I don't think so. I'm not entirely sure I think it's a good idea to
use external identifiers for XML namespaces. (I also don't think
there's a precedent for SYSTEM w/o allowing PUBLIC and, much as I
support public identifiers, I'm not sure they work sensibly with XML
namespaces (more's the pity).)

| Issue: this does not help when prefixes are not used consistently
| throughout an instance.  Do we care?  My answer: no.

Sure it does. That's the whole point of the declaration!

I expect the parser to understand:

<!DOCTYPE foo:test [
<!NAMESPACE foo "http://www.example.com/foo";>
<!ELEMENT foo:test (foo:para+)>
<!ELEMENT foo:para (#PCDATA)*>
]>
<bar:test xmlns:bar="http://www.example.com/foo";>
<para xmlns="http://www.example.com/foo"/>
</bar:test>

If it's going to be done post-XML parsing as you proposed, then move
that internal subset to an external file, and imagine you have two
documents:

example.xml:
<bar:test xmlns:bar="http://www.example.com/foo";>
<para xmlns="http://www.example.com/foo"/>
</bar:test>

example:edtd:
<!NAMESPACE foo "http://www.example.com/foo";>
<!ELEMENT foo:test (foo:para+)>
<!ELEMENT foo:para (#PCDATA)*>

I expect a validator to be able to test the validity of example.xml
with example.edtd.

Or maybe I'm missing something.

| 2) Attribute data types.  The names that can appear in an ATTLIST
| declaration directly after an attribute name are extended to include
| the datatype names of part 5 (i.e. XSD simple types).
|
| Example:
|
| <!ATTLIST baz
| 	foo integer #implied
| 	baz integer #required>
|
| Issue: do we need to make the datatype list extensible?  If so, we could
| use QNames and a DATATYPE declaration, rather like the compact syntax
| of RELAX NG.

Mumble. I guess I'd go with qnames and let it be extensible. Off the
top of my head, anyway.

| 3) Element simple datatypes.  Likewise, unparenthesized content models
| in ELEMENT declarations are extended from just ANY and EMPTY to include
| these same datatypes.
|
| Example: <!ELEMENT foo nonNegativeInteger>
|
| 4) Datatype lists.  In either #2 or #3 context, a simple datatype name
| can be replaced by "LIST(name)" to indicate a whitespace-separated
| list of strings matching the datatype.	IDREFS is equal to LIST(IDREF),
| and ENTITIES is equal to LIST(ENTITY).

There's fairly limited utility in extending DTDs. I think this is
starting to make it too expensive. I'm not sure I don't feel the same
way about 3 and if pressed, even 2.

| 5) Datatype choice.  In either #2 or #3 context, a simple or LIST-wrapped
| datatype name can be replaced by |-separated names, to indicate a choice
| (derivation by union in WXS terms).
|
| Example: <!ELEMENT bar integer|name>
|
| Issue: what do we do about XSD facets?	They are important but don't
| easily fit into the rigid DTD syntax.

Too much complexity.

| 6) Restore & connector.  Bring back the & connector, either with the
| SGML semantics (A,B)|(B,A), or preferably with the RELAX NG "interleave"
| semantics.  The difference is that, given the content model "A & B+",
| the element sequences A, B, B, B and B, B, B, A will match in either case,
| but B, A, B, B will only match using interleave semantics.
|
| Issue: SGML or interleave?  My answer: interleave

My answer, don't do it.

| 7) Abandon SGML 1-ambiguity rules.  Instead, allow complete flexibility of
| content models.  See James Clark's discussion in "The Design of RELAX NG".

Nope.

| 8) Restore multiple element and attribute names separated by |s.
| This makes for conciseness and easy authoring.	These constructs were
| dumped in XML DTDs because they imposed extra cost on validating parsers,
| but in this model validation is something done outside parsing, so higher
| cost is worthwhile.

Nah, I think this is a bit of syntactic sugar I can live without.

| 9) Fixed element content.  Allow ELEMENT declarations to specify "#FIXED
| 'value'" after a datatype.
|
| Example: <!ELEMENT foo integer #FIXED "5">
|
| This means that the content of any foo element must be equivalent to 5
| according to the "integer" datatype's equivalence relation: therefore,
| 05, 005, +5, etc. will pass validation.

Nope.

| General issue:	Should there be some way to indicate candidate roots?
| In existing DTDs, any element can be a root.

Nope.

                                        Be seeing you,
                                          norm

-- 
Norman.Walsh@Sun.COM   | One should always be a little
XML Standards Engineer | improbable.--Oscar Wilde
XML Technology Center  | 
Sun Microsystems, Inc. |

Follow-Ups:
- Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
  - From: Ronald Bourret <rpbourret@rpbourret.com>

References:
- Come On, DTD, Come On! Thoughts on DSDL Part 9
  - From: John Cowan <jcowan@reutershealth.com>

Prev by Date: RE: [xml-dev] W3C Schema: Resistance is Futile, says Don Box
Next by Date: Re: [xml-dev] W3C Schema: Resistance is Futile, says Don Box
Previous by thread: Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
Next by thread: Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
Index(es):
- Date
- Thread