OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Abstract Equivalence (was Microsoft FUD on binaryXML...)

[ Lists Home | Date Index | Thread Index ]

At 3:46 AM +0000 11/25/03, Alaric B Snell wrote:

>I just had another idea. Say you wrote an application that read XML 
>documents but depended on something *conventionally* ignored - 
>perhaps it processes the file significantly differently depending on 
>the encoding the file happened to be in when it read it, or 
>something like that - or perhaps an XHTML browser that behaved 
>significantly differently when an image had width="010" than when an 
>image had width="10".
>(I've said "significantly differently" to exclude cases like 'A 
>generic XML editor tries to save documents back in the same encoding 
>it read them in' or 'The XHTML browser shows leading zeroes on width 
>attributes when you hit View Source')
>Would you not say that those applications are broken for being 
>sensitive to such 'irrelevant' syntactic variations?
>Thus implying that there *is* an abstract model lurking there?

There's a confusion of layers here. 
<http://www.cafeconleche.org/books/effectivexml/chapters/15.html> If 
the parser did something significantly different based on two 
different encodings (both correctly understood by the parser) that 
would be a bug because XML is defined in terms of characters, not 
bytes. The conversion of bytes to characters happens below the layer 
of syntax. Therefore it's incorrect to depend on it.

The second example with  width="010" vs. width="10" would be a 
violation the rules of XHTML, but would in no way be a violation of 
the rules of XML. XHTML says 010 in a width attribute is the same as 
10. XML does not. The violation is in the semantic layer. Different 
semantics might treat 010 as different from 10, and be perfectly 
correct in doing so. As someone else suggested a few weeks ago a 
leading 0 might indicate an octal number in some contexts. Thus 10 
might be ten and 010 might be eight. That's not wrong. It's just 

Neither of the examples you cite is an irrelevant syntactic 
variation. The first is an irrelevant binary variation. The second is 
a relevant syntactic variation which becomes an irrelevant semantic 
variation when interpreted by one particular application.

   Elliotte Rusty Harold
   Effective XML (Addison-Wesley, 2003)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS