[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: DTDs and namespaces (was: using namespaces to version)
- From: james anderson <james.anderson@setf.de>
- To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
- Date: Tue, 08 May 2001 03:41:03 +0200
C. M. Sperberg-McQueen wrote:
> ... no system which validates using a DTD and the validation
> rules of XML 1.0, without extension, can support all the syntactic
> variations allowed by the namespaces recommendation.
This is trivially true, but trivial invalidation is not significant to
the question. For those cases which are valid according to a naive
interpretation of XML-1.0/REC-Names, one could have defined an URI form
which conforms to the name syntax and empowered a registration authority
to ensure uniqueness. Done. No namespaces necessary.
That is, those cases where the attribute defaults specified in a DTD,
when taken as global assignments, would neither under- nor overconstrain
the types assigned to qualified names, those remain valid according to a
trivial interpretation of XML-1.0/REC-Names, require nothing more than
an identity mapping from qualified to universal names. It is only that
attribute bindings permit a greater degree of syntactic latitude when
choosing the namespace names which would argue for their use.
It is exactly those cases which would be trivially malformed or invalid
for which namespaces are necessary. These cases include documents where
a given prefix, for example that for the default namespace, is bound to
more than one namespace name. In these cases, type assignments which
follow from qualified names would be overconstrained. The names must be
distinguished according to context. This is the "homography" case.
> In their full generality, however, the rules of the namespace
> recommendation allow homography: elements with different universal
> names (and thus potentially different declarations) can appear with
> the same prefix + colon + localname as their generic identifier.
The interesting cases also include the documents where the type to
assign to a given qualified name cannot be determined on the basis of
XML-1.0 interpretation of the DTD. The assignment is underconstrained.
This is the "synonymy" case.
> In their full generality, the rules of the namespace recommendation
> also allow synonymy: elements with the same universal name can appear
> with different generic identifiers.
>
In both cases, REC-names provides rules adequate to effect distinction
among homographs and conflation of synonyms within the document entity.
It contains no rules to govern this process within the internal and
external subsets. Which leaves us with the following conjecture.
[to paraphrase]
> When I say that DTDs [can] 'support' namespaces I mean simply that
> given some plausible account of the rules which govern elements in
> some set of namespaces, and the rules of the namespace recommendation
> (which include the ability to bind arbitrary prefixes to arbitrary
> namespaces), it is [] possible to write a DTD which ([elided, as the "normal"
> don't, by definition, apply]) recognizes the set of documents which
> follow the rules, and distinguishes them from documents which don't.
I suggest that the following rules suffice.
Qualified names in a DTD are synomous
a, if they are in the same external subset entity hierarchy and are
lexically identical, or
b, if there exist "visible" prefix/namespace bindings such that the
effective universal names are identical.
The visibility of namespace binding in a DTD is determined as follows.
The immediate scope of a binding is the given attribute list declaration.
The scope extends to comprise the element declarations for which the
declared element name is identical with the declared element name of the
attribute list declaration.
The scope extends further to encompass element declarations for which
the declared element name is identical with a name which appears in the
content model of the in-scope element declaration. Where the content
model is ANY, all element declarations are included. Where the content
model is EMPTY, no element declaration is included.
This is the short form. The long form is about three dozen function
definitions. They suffice to effect the following distinction:
? (write-node
(document-parser
"<!DOCTYPE doc [
<!ELEMENT doc (a:x | b:x)* >
<!ATTLIST doc xmlns CDATA 'http://www.example.org/ns-bare'>
<!ELEMENT a:x (tick, tock)>
<!ATTLIST a:x xmlns:a CDATA 'http://www.example.org/ns-a'>
<!ELEMENT b:x (tick, tick, tock)>
<!ATTLIST b:x xmlns:b CDATA 'http://www.example.org/ns-b'>
<!ELEMENT tock ANY>
<!ELEMENT tick ANY>
]>
<doc xmlns='http://www.example.org/ns-bare'
xmlns:c='http://www.example.org/ns-bare'>
<x xmlns='http://www.example.org/ns-a'><c:tick/><c:tock/></x>
<x xmlns='http://www.example.org/ns-b'><c:tick/><c:tick/><c:tock/></x>
<x xmlns='http://www.example.org/ns-a'><c:tick/><c:tock/></x>
</doc>"
:validate t)
*trace-output*)
<c:doc xmlns='http://www.example.org/ns-bare' xmlns:c='http://www.example.org/ns-bare'>
<a:x xmlns='http://www.example.org/ns-a'
xmlns:a='http://www.example.org/ns-a'><c:tick /><c:tock /></a:x>
<b:x xmlns='http://www.example.org/ns-b'
xmlns:b='http://www.example.org/ns-b'><c:tick /><c:tick /><c:tock /></b:x>
<a:x xmlns='http://www.example.org/ns-a'
xmlns:a='http://www.example.org/ns-a'><c:tick /><c:tock /></a:x>
</c:doc>
NIL
? (write-node
(document-parser
"<!DOCTYPE doc [
<!ELEMENT doc (a:x | b:x)* >
<!ATTLIST doc xmlns CDATA 'http://www.example.org/ns-bare'>
<!ELEMENT a:x (tick, tock)>
<!ATTLIST a:x xmlns:a CDATA 'http://www.example.org/ns-a'>
<!ELEMENT b:x (tick, tick, tock)>
<!ATTLIST b:x xmlns:b CDATA 'http://www.example.org/ns-b'>
<!ELEMENT tock ANY>
<!ELEMENT tick ANY>
]>
<doc xmlns='http://www.example.org/ns-bare'
xmlns:c='http://www.example.org/ns-bare'>
<x xmlns='http://www.example.org/ns-b'><c:tick/><c:tock/></x>
<x xmlns='http://www.example.org/ns-a'><c:tick/><c:tick/><c:tock/></x>
<x xmlns='http://www.example.org/ns-a'><c:tick/><c:tock/></x>
</doc>"
:validate t)
*trace-output*)
> Error: Error #<|VC: Element Content| #x5079E56>
> parse error with-state (:INPUTS ((:SOURCE #<VECTOR-INPUT-STREAM #(3C 21 44 4F 43 54 59 50 45 20 64 6F 63 20 5B D 20 20 20 20 3C 21 45 4C 45 4D 45 4E 54 20 64 6F ...)> :POSITION 529 :COLUMN 117 :LINE ...)) :INPUT #\Newline :TOKEN NIL :CONTEXT ...).
> [VC: Element Content] : content must match the element model: (#<ELEM-NODE |http://www.example.org/ns-bare|::|tick| #x5078BFE> #<ELEM-NODE |http://www.example.org/ns-bare|::|tock| #x50791E6>):
> x ::= (tick tick tock).
> While executing: #<STANDARD-METHOD XML-ERROR (CONTINUABLE-PARSE-ERROR)>
> Type Command-/ to continue, Command-. to abort.
> If continued: attept to continue to parse.
See the Restarts… menu item for further choices.
1 >
Aborted
?
It comes down to what one means by "validation rules, without extension, ..."
...
ps. It was noted, that,
>
> It is possible, using clever parameter entity tricks, to allow the
> user to associate namespaces with arbitrary prefixes. This is a
> partial victory.
>
The parameter entity encoding rules remind me of time spent coding in
PDP-1 macro-assembly language. I suspect they accomplish the same thing
as propagation, as their effect is to declare the identity of, or
distinguish among, "proto"-prefixes, but I am at a loss as to why one
would want to code all of that literally.