OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: What is an XML Document? [Was: Re: [xml-dev] canonicalization]

[ Lists Home | Date Index | Thread Index ]

On Mon, Mar 04, 2002 at 10:40:23PM -0500, Elliotte Rusty Harold wrote:
> At 7:44 PM -0500 3/4/02, Daniel Veillard wrote:
> >   I would be tempted to tease and ask what is an XML document (would
> >the TAG ever find the answer ;-) . I also note that in use case like
> >the Jabber protocol, you never end-up with a "fully composed document"
> >it only exists once the processing is finished and that it had become
> >useless.
> Is this really a problem?

 In practice not really. 

> According to the XML spec "A data object is 
> an XML document if it is well-formed, as defined in this 
> specification." That does leave the question of what a data object 
> is, but I think a reasonable answer is "a sequence of bytes or a 
> sequence of Unicode characters". Pretty clearly the spec does not 
> intend that a data object be a traditional OOP object of some kind.

  Well that sequence of bytes may actually become a set of sequences
as soon as one is dealing with external entities. And the duality 
well-formedness vs. validating parser exhibited from the specification
show that what is named in a similar way (the main entity resource)
may end up being considered differently by two instance.
  But as said in practice it's not such a big deal because the fact that
some external resources may be missing has to be handled in the tool
chain anyway. And when the requirement is that the set to be seen must be the
same this can usually be implemented either by disabling any external
access or turning missing set into errors.
  Still the Jabber case is an interesting example in my opinion because
they stretch the usual principle of keeping instances "atomic" and instead
agree to work on a long lived "never ending" document. And in such use
case entities doesn't work (because there isn't even a DOCTYPE at the
start of the connection), while XInclude does (assuming the parser handle
them of course), it's intersing to see various specification taken from
a Jabber view point, a lot of them actually requires a full document
instance and won't work directly in such a context.

> I do wish the spec made that last point explicit, but I do think it 
> won't get anybody into trouble and might indeed pull a few developers 
> people out of the quicksand they've mired themselves in by believing 
> things like objects can be XML documents instead of representation of 
> an XML document. (To cite a classic OOP example, nobody believes a 
> Car object is a car. Why do developers insist on claiming Document 
> objects are documents?)

  I don't ;-) . Still there are some properties one would expect to see
(two readers of a same document see the same sequence of character) but
which are not garanteed by a document object. Just a fact one need to be
aware of. Both the Infoset and C14N are trying to adapt or forbid those
case, and most developpers would better understand the issue.


Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS