[
Lists Home |
Date Index |
Thread Index
]
On Tue, Mar 05, 2002 at 02:31:30PM -0500, Elliotte Rusty Harold wrote:
> At 1:56 PM -0500 3/5/02, Daniel Veillard wrote:
>
>
> > Well that sequence of bytes may actually become a set of sequences
> >as soon as one is dealing with external entities.
>
> A good point. The way the spec is written though I think it's
> consistent to claim that the document is only the byte/character
> sequence that references the external entities. It does not actually
> include the merged text of the entities. The spec also states that:
>
> [Definition: A textual object is a well-formed XML document if:]
>
> 1. Taken as a whole, it matches the production labeled document.
^^^^^^^^^^^^^^^
This occurs after
----------------
Each XML document has both a logical and a physical structure. Physically,
the document is composed of units called entities. An entity may refer
to other entities to cause their inclusion in the document. A document
begins in a "root" or document entity.
----------------
For me there is no doubt that the document is the set. Well formedness
is defined for the set, and a well formedness error detected when parsing
an external entity affects the whole document.
Anyway, even if the REC may be ambiguous, from a programmer viewpoint the
document instance will likely to be based on those extra sets, XPath
for example requiring them.
> > Still the Jabber case is an interesting example in my opinion because
> >they stretch the usual principle of keeping instances "atomic" and instead
> >agree to work on a long lived "never ending" document. And in such use
> >case entities doesn't work (because there isn't even a DOCTYPE at the
> >start of the connection), while XInclude does (assuming the parser handle
> >them of course), it's intersing to see various specification taken from
> >a Jabber view point, a lot of them actually requires a full document
> >instance and won't work directly in such a context.
> >
>
> Another good point. However, the BNF grammar and well-formedness
> constraints make it clear that an infinite sequence cannot possibly
> be a well-formed XML document. Thus my definition of data object
> should be revised to say "either a finite sequence of bytes or a
> finite sequence of Unicode characters". I don't know if a Jabber
> document is truly infinite or just indefinitely large. (Looking at
> the spec I think it's just indefinite.)
Yes, the connection get closed by an exchange of </stream:stream>
so it's finite in practice but the software needs to be built to
process incrementally indefinitely large instances. Very much the
foundation principle of SAX (but a progressive DOM builder can work
too if you discard processed nodes).
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
|