OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: What is an XML Document? [Was: Re: [xml-dev] canonicalization]

[ Lists Home | Date Index | Thread Index ]

On Tue, Mar 05, 2002 at 02:31:30PM -0500, Elliotte Rusty Harold wrote:
> At 1:56 PM -0500 3/5/02, Daniel Veillard wrote:
> >   Well that sequence of bytes may actually become a set of sequences
> >as soon as one is dealing with external entities.
> A good point. The way the spec is written though I think it's 
> consistent to claim that the document is only the byte/character 
> sequence that references the external entities. It does not actually 
> include the merged text of the  entities. The spec also states that:
> [Definition: A textual object is a well-formed XML document if:]
> 1. Taken as a whole, it matches the production labeled document.

  This occurs after

Each XML document has both a logical and a physical structure. Physically,
the document is composed of units called entities. An entity may refer
to other entities to cause their inclusion in the document. A document
begins in a "root" or document entity.

  For me there is no doubt that the document is the set. Well formedness
is defined for the set, and a well formedness error detected when parsing
an external entity affects the whole document.

  Anyway, even if the REC may be ambiguous, from a programmer viewpoint the 
document instance will likely to be based on those extra sets, XPath 
for example requiring them.

> >   Still the Jabber case is an interesting example in my opinion because
> >they stretch the usual principle of keeping instances "atomic" and instead
> >agree to work on a long lived "never ending" document. And in such use
> >case entities doesn't work (because there isn't even a DOCTYPE at the
> >start of the connection), while XInclude does (assuming the parser handle
> >them of course), it's intersing to see various specification taken from
> >a Jabber view point, a lot of them actually requires a full document
> >instance and won't work directly in such a context.
> >
> Another good point. However, the BNF grammar and well-formedness 
> constraints make it clear that an infinite sequence cannot possibly 
> be a well-formed XML document. Thus my definition of data object 
> should be revised to say "either a finite sequence of bytes or a 
> finite sequence of Unicode characters". I don't know if a Jabber 
> document is truly infinite or just indefinitely large. (Looking at 
> the spec I think it's just indefinite.)

Yes, the connection get closed by an exchange of </stream:stream>
so it's finite in practice but the software needs to be built to
process incrementally indefinitely large instances. Very much the
foundation principle of SAX (but a progressive DOM builder can work
too if you discard processed nodes).


Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS