[
Lists Home |
Date Index |
Thread Index
]
/ John Cowan <jcowan@reutershealth.com> was heard to say:
| I am assuming that the context for extending DTDs is not redefining
| XML, but rather creating an enhanced XML DTD format which can be used by an
| external validator.
Hmm. I imagined this work would redefine XML DTDs. If it's all for
post-parsing validation, why bother? We've got RNG, XSD, and
Schematron (and others).
(By the same token, if the result won't be XML, why bother, but ...)
| Example:
|
| <!NS foo SYSTEM "http://www.example.com/foo">
I've always imagined this as:
<!NAMESPACE foo "http://www.example.com/foo">
There's no precedent for abbreviation in the decl name.
| Issue: Is it an error to mention a prefix that is not declared? My
| answer: no; if this is done, name matching falls back to string identity.
If the DTD contains an <!NAMESPACE decl, yes, otherwise no.
| Issue: is the keyword SYSTEM useful?
I don't think so. I'm not entirely sure I think it's a good idea to
use external identifiers for XML namespaces. (I also don't think
there's a precedent for SYSTEM w/o allowing PUBLIC and, much as I
support public identifiers, I'm not sure they work sensibly with XML
namespaces (more's the pity).)
| Issue: this does not help when prefixes are not used consistently
| throughout an instance. Do we care? My answer: no.
Sure it does. That's the whole point of the declaration!
I expect the parser to understand:
<!DOCTYPE foo:test [
<!NAMESPACE foo "http://www.example.com/foo">
<!ELEMENT foo:test (foo:para+)>
<!ELEMENT foo:para (#PCDATA)*>
]>
<bar:test xmlns:bar="http://www.example.com/foo">
<para xmlns="http://www.example.com/foo"/>
</bar:test>
If it's going to be done post-XML parsing as you proposed, then move
that internal subset to an external file, and imagine you have two
documents:
example.xml:
<bar:test xmlns:bar="http://www.example.com/foo">
<para xmlns="http://www.example.com/foo"/>
</bar:test>
example:edtd:
<!NAMESPACE foo "http://www.example.com/foo">
<!ELEMENT foo:test (foo:para+)>
<!ELEMENT foo:para (#PCDATA)*>
I expect a validator to be able to test the validity of example.xml
with example.edtd.
Or maybe I'm missing something.
| 2) Attribute data types. The names that can appear in an ATTLIST
| declaration directly after an attribute name are extended to include
| the datatype names of part 5 (i.e. XSD simple types).
|
| Example:
|
| <!ATTLIST baz
| foo integer #implied
| baz integer #required>
|
| Issue: do we need to make the datatype list extensible? If so, we could
| use QNames and a DATATYPE declaration, rather like the compact syntax
| of RELAX NG.
Mumble. I guess I'd go with qnames and let it be extensible. Off the
top of my head, anyway.
| 3) Element simple datatypes. Likewise, unparenthesized content models
| in ELEMENT declarations are extended from just ANY and EMPTY to include
| these same datatypes.
|
| Example: <!ELEMENT foo nonNegativeInteger>
|
| 4) Datatype lists. In either #2 or #3 context, a simple datatype name
| can be replaced by "LIST(name)" to indicate a whitespace-separated
| list of strings matching the datatype. IDREFS is equal to LIST(IDREF),
| and ENTITIES is equal to LIST(ENTITY).
There's fairly limited utility in extending DTDs. I think this is
starting to make it too expensive. I'm not sure I don't feel the same
way about 3 and if pressed, even 2.
| 5) Datatype choice. In either #2 or #3 context, a simple or LIST-wrapped
| datatype name can be replaced by |-separated names, to indicate a choice
| (derivation by union in WXS terms).
|
| Example: <!ELEMENT bar integer|name>
|
| Issue: what do we do about XSD facets? They are important but don't
| easily fit into the rigid DTD syntax.
Too much complexity.
| 6) Restore & connector. Bring back the & connector, either with the
| SGML semantics (A,B)|(B,A), or preferably with the RELAX NG "interleave"
| semantics. The difference is that, given the content model "A & B+",
| the element sequences A, B, B, B and B, B, B, A will match in either case,
| but B, A, B, B will only match using interleave semantics.
|
| Issue: SGML or interleave? My answer: interleave
My answer, don't do it.
| 7) Abandon SGML 1-ambiguity rules. Instead, allow complete flexibility of
| content models. See James Clark's discussion in "The Design of RELAX NG".
Nope.
| 8) Restore multiple element and attribute names separated by |s.
| This makes for conciseness and easy authoring. These constructs were
| dumped in XML DTDs because they imposed extra cost on validating parsers,
| but in this model validation is something done outside parsing, so higher
| cost is worthwhile.
Nah, I think this is a bit of syntactic sugar I can live without.
| 9) Fixed element content. Allow ELEMENT declarations to specify "#FIXED
| 'value'" after a datatype.
|
| Example: <!ELEMENT foo integer #FIXED "5">
|
| This means that the content of any foo element must be equivalent to 5
| according to the "integer" datatype's equivalence relation: therefore,
| 05, 005, +5, etc. will pass validation.
Nope.
| General issue: Should there be some way to indicate candidate roots?
| In existing DTDs, any element can be a root.
Nope.
Be seeing you,
norm
--
Norman.Walsh@Sun.COM | One should always be a little
XML Standards Engineer | improbable.--Oscar Wilde
XML Technology Center |
Sun Microsystems, Inc. |
|