OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Come On, DTD, Come On! Thoughts on DSDL Part 9

[ Lists Home | Date Index | Thread Index ]

ISO/IEC 19757-9 is currently an empty hole titled "Datatype- and
namespace-aware DTDs".  This is a ragbag of ideas to fill that hole.

I am assuming that the context for extending DTDs is not redefining
XML, but rather creating an enhanced XML DTD format which can be used by an
external validator.  Examples of existing external validators are Jing
for RELAX NG, XSV for W3C XML Schema, and Sun MSV for many different
schema formats (including DTDs).  Enhanced DTDs would not be acceptable
to XML validating parsers.

I think the following enhancements to standard XML DTDs are worth
considering.  They are directed to making DTD authoring easier and
more flexible.  Nothing is introduced that is beyond the current schema
language state of the art.

1) The NS declaration.  Declarations of the form <!NS name SYSTEM "uri">
are allowed to define the namespaces associated with CNames in ELEMENT
and ATTLIST declarations.  As is the case for other schema languages, in
the presence of a known prefix, name matching is done on the universal
name (URI + local-part) rather than the CName.  The default namespace
is declared using #DEFAULT in place of the name.


<!NS foo SYSTEM "http://www.example.com/foo";>
<!NS bar SYSTEM "http://www.example.com/bar";>
<!ELEMENT foo:a (foo:b)>

Issue: Is it an error to mention a prefix that is not declared?  My
answer: no; if this is done, name matching falls back to string identity.

Issue: is the keyword SYSTEM useful?

Issue: this does not help when prefixes are not used consistently
throughout an instance.  Do we care?  My answer: no.

2) Attribute data types.  The names that can appear in an ATTLIST
declaration directly after an attribute name are extended to include
the datatype names of part 5 (i.e. XSD simple types).


	foo integer #implied
	baz integer #required>

Issue: do we need to make the datatype list extensible?  If so, we could
use QNames and a DATATYPE declaration, rather like the compact syntax

3) Element simple datatypes.  Likewise, unparenthesized content models
in ELEMENT declarations are extended from just ANY and EMPTY to include
these same datatypes.

Example: <!ELEMENT foo nonNegativeInteger>

4) Datatype lists.  In either #2 or #3 context, a simple datatype name
can be replaced by "LIST(name)" to indicate a whitespace-separated
list of strings matching the datatype.	IDREFS is equal to LIST(IDREF),
and ENTITIES is equal to LIST(ENTITY).

5) Datatype choice.  In either #2 or #3 context, a simple or LIST-wrapped
datatype name can be replaced by |-separated names, to indicate a choice
(derivation by union in WXS terms).

Example: <!ELEMENT bar integer|name>

Issue: what do we do about XSD facets?	They are important but don't
easily fit into the rigid DTD syntax.

6) Restore & connector.  Bring back the & connector, either with the
SGML semantics (A,B)|(B,A), or preferably with the RELAX NG "interleave"
semantics.  The difference is that, given the content model "A & B+",
the element sequences A, B, B, B and B, B, B, A will match in either case,
but B, A, B, B will only match using interleave semantics.

Issue: SGML or interleave?  My answer: interleave

7) Abandon SGML 1-ambiguity rules.  Instead, allow complete flexibility of
content models.  See James Clark's discussion in "The Design of RELAX NG".

8) Restore multiple element and attribute names separated by |s.
This makes for conciseness and easy authoring.	These constructs were
dumped in XML DTDs because they imposed extra cost on validating parsers,
but in this model validation is something done outside parsing, so higher
cost is worthwhile.

9) Fixed element content.  Allow ELEMENT declarations to specify "#FIXED
'value'" after a datatype.

Example: <!ELEMENT foo integer #FIXED "5">

This means that the content of any foo element must be equivalent to 5
according to the "integer" datatype's equivalence relation: therefore,
05, 005, +5, etc. will pass validation.

General issue:	Should there be some way to indicate candidate roots?
In existing DTDs, any element can be a root.

General issue: We need to figure out what to do if the instance contains
an internal DTD (by which I mean an internal subset, a reference to an
external subset, or both).  Should internal validation be required,
permitted, or forbidden when doing external validation?  (I take it
for granted that if it is to be done, it will be done in the parser,
i.e. first.)  What is the effect of attribute defaulting specified by
the internal DTD on the external validation process?  internal validation
be done before external validation or turned off

John Cowan <jcowan@reutershealth.com>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS