OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9

[ Lists Home | Date Index | Thread Index ]

/ John Cowan <jcowan@reutershealth.com> was heard to say:
| I am assuming that the context for extending DTDs is not redefining
| XML, but rather creating an enhanced XML DTD format which can be used by an
| external validator.

Hmm. I imagined this work would redefine XML DTDs. If it's all for
post-parsing validation, why bother? We've got RNG, XSD, and
Schematron (and others).

(By the same token, if the result won't be XML, why bother, but ...)

| Example:
| <!NS foo SYSTEM "http://www.example.com/foo";>

I've always imagined this as:

 <!NAMESPACE foo "http://www.example.com/foo";>

There's no precedent for abbreviation in the decl name.

| Issue: Is it an error to mention a prefix that is not declared?  My
| answer: no; if this is done, name matching falls back to string identity.

If the DTD contains an <!NAMESPACE decl, yes, otherwise no.

| Issue: is the keyword SYSTEM useful?

I don't think so. I'm not entirely sure I think it's a good idea to
use external identifiers for XML namespaces. (I also don't think
there's a precedent for SYSTEM w/o allowing PUBLIC and, much as I
support public identifiers, I'm not sure they work sensibly with XML
namespaces (more's the pity).)

| Issue: this does not help when prefixes are not used consistently
| throughout an instance.  Do we care?  My answer: no.

Sure it does. That's the whole point of the declaration!

I expect the parser to understand:

<!DOCTYPE foo:test [
<!NAMESPACE foo "http://www.example.com/foo";>
<!ELEMENT foo:test (foo:para+)>
<!ELEMENT foo:para (#PCDATA)*>
<bar:test xmlns:bar="http://www.example.com/foo";>
<para xmlns="http://www.example.com/foo"/>

If it's going to be done post-XML parsing as you proposed, then move
that internal subset to an external file, and imagine you have two

<bar:test xmlns:bar="http://www.example.com/foo";>
<para xmlns="http://www.example.com/foo"/>

<!NAMESPACE foo "http://www.example.com/foo";>
<!ELEMENT foo:test (foo:para+)>
<!ELEMENT foo:para (#PCDATA)*>

I expect a validator to be able to test the validity of example.xml
with example.edtd.

Or maybe I'm missing something.

| 2) Attribute data types.  The names that can appear in an ATTLIST
| declaration directly after an attribute name are extended to include
| the datatype names of part 5 (i.e. XSD simple types).
| Example:
| <!ATTLIST baz
| 	foo integer #implied
| 	baz integer #required>
| Issue: do we need to make the datatype list extensible?  If so, we could
| use QNames and a DATATYPE declaration, rather like the compact syntax
| of RELAX NG.

Mumble. I guess I'd go with qnames and let it be extensible. Off the
top of my head, anyway.

| 3) Element simple datatypes.  Likewise, unparenthesized content models
| in ELEMENT declarations are extended from just ANY and EMPTY to include
| these same datatypes.
| Example: <!ELEMENT foo nonNegativeInteger>
| 4) Datatype lists.  In either #2 or #3 context, a simple datatype name
| can be replaced by "LIST(name)" to indicate a whitespace-separated
| list of strings matching the datatype.	IDREFS is equal to LIST(IDREF),
| and ENTITIES is equal to LIST(ENTITY).

There's fairly limited utility in extending DTDs. I think this is
starting to make it too expensive. I'm not sure I don't feel the same
way about 3 and if pressed, even 2.

| 5) Datatype choice.  In either #2 or #3 context, a simple or LIST-wrapped
| datatype name can be replaced by |-separated names, to indicate a choice
| (derivation by union in WXS terms).
| Example: <!ELEMENT bar integer|name>
| Issue: what do we do about XSD facets?	They are important but don't
| easily fit into the rigid DTD syntax.

Too much complexity.

| 6) Restore & connector.  Bring back the & connector, either with the
| SGML semantics (A,B)|(B,A), or preferably with the RELAX NG "interleave"
| semantics.  The difference is that, given the content model "A & B+",
| the element sequences A, B, B, B and B, B, B, A will match in either case,
| but B, A, B, B will only match using interleave semantics.
| Issue: SGML or interleave?  My answer: interleave

My answer, don't do it.

| 7) Abandon SGML 1-ambiguity rules.  Instead, allow complete flexibility of
| content models.  See James Clark's discussion in "The Design of RELAX NG".


| 8) Restore multiple element and attribute names separated by |s.
| This makes for conciseness and easy authoring.	These constructs were
| dumped in XML DTDs because they imposed extra cost on validating parsers,
| but in this model validation is something done outside parsing, so higher
| cost is worthwhile.

Nah, I think this is a bit of syntactic sugar I can live without.

| 9) Fixed element content.  Allow ELEMENT declarations to specify "#FIXED
| 'value'" after a datatype.
| Example: <!ELEMENT foo integer #FIXED "5">
| This means that the content of any foo element must be equivalent to 5
| according to the "integer" datatype's equivalence relation: therefore,
| 05, 005, +5, etc. will pass validation.


| General issue:	Should there be some way to indicate candidate roots?
| In existing DTDs, any element can be a root.


                                        Be seeing you,

Norman.Walsh@Sun.COM   | One should always be a little
XML Standards Engineer | improbable.--Oscar Wilde
XML Technology Center  | 
Sun Microsystems, Inc. | 


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS