OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Sweet nostalgia

[ Lists Home | Date Index | Thread Index ]

Greg Colyer <greg-xml@elysium.ltd.uk> writes:

<snip/>
> 
> This is interesting, from Sean McGrath on the diveintomark page:
> 
> > Programming languages that barf on a syntax error do so because a 
> > partial executable image is a useless thing. A partial document is 
> > *not* a useless thing. One of the cool things about XML as 
> a document 
> > format is that some of the content can be recovered even in 
> the face 
> > of error. Compare this to our binary document friends where a blown 
> > byte can render the entire content inaccessible.
> 
> I was pondering the other day in this context about the distinction 
> between text and binary, or more particularly between a 
> markup language 
> and a file format. Perhaps part of the problem is that Rec-XML is a 
> little schizophrenic <http://www.w3.org/TR/REC-xml#sec-documents>:
> 
> > A data object is an XML document if it is well-formed ...
> 
> > A textual object is a well-formed XML document if ...
> 
> The draconians know that "well-formed XML document" and "XML 
> document" 
> are identical: if it's not well-formed, it's not XML! The liberals 
> argue: then why is the superfluous term "well-formed" even 
> used? No-one 
> talks about a "well-formed JPEG file": it's either correct or 
> it isn't.

Hmm, I don't think I buy this argument.  This gets a back to Len's
question about "noise" and my response about error correction.  While it
is true that XML carries with it some extra information that makes it
somewhat more robust in the face of corruption it isn't true that this
extra information allows you to understand what the errors where. (Nor
is it true that all binary formats are useless in when a byte is
"blown", many of them carry enough information that errors are
recoverable.)

The fact is, the extra information in XML isn't of a form that allows
you to do robust error recovery.  It might allow you to decide that you
understand some portion of the document, but if a document is corrupt do
you really want to assume that a missing end tag is the only problem and
that the real problem isn't that an entire paragraph including the
missing end tag got deleted? 
 
<snip/>





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS