xml-dev - Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)

Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)

[ Lists Home | Date Index | Thread Index ]

To: Alaric B Snell <alaric@alaric-snell.com>
Subject: Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Sat, 8 Nov 2003 21:47:52 -0500
Cc: xml-dev@lists.xml.org
In-reply-to: <3FAD85FF.8040906@alaric-snell.com>
References: <000f01c3a614$7a3e3600$650aa8c0@BOBDEV><p06002001bbd2d8e54c12@[192.168.254.4]><3FAD85FF.8040906@alaric-snell.com>

At 12:10 AM +0000 11/9/03, Alaric B Snell wrote:

>In some contexts they are; in some they are not. It all depends on 
>who is viewing it. If you stop and think, you will realise that an 
>API like SAX in no way *forces* 7 to be treated identically to 07. 
>Clearly you don't know a thing about how computers work if you think 
>that *some* pieces of software treating 7 the same as 07 *if they 
>wish to* will somehow emit data-destroying rays that spread across 
>the Internet and lop the significant leading zeroes off of telephone 
>numbers, eh?

Like many people who want typed data you're confusing the local with 
the global. Of course my software will treat the strings in a way I 
find useful. However, that in now way means you have to treat them 
the same way. You may want floats. I may want ints. Simon may want 
strings. There's no one right answer. The underlying premise that 
suggests we should exchange typed, binary data is hat there is one 
right answer; one type that's better for the data than all the 
others; and that simply isn't true. The more complex the data becomes 
the less true it is.

>You might be surprised to find that a lot of software *does* take 
>character strings out of an XML document and immediately parse them 
>as decimal integers. This doesn't appear to have broken XML, does 
>it? Shock horror!

Locally, of course not. The problem is when applications start 
exchanging their typed binary representations as the one truth rather 
than recognizing it as simply one way of seeing the world.

>Yes, the bit where he said that "This SAX option would be 
>compulsory, and all XML parsers would be international agreement be 
>required to have this option turned on; software that does not turn 
>this option on would not be allowed to be sold, or written, because 
>I know full well that a character string of '07' at a point in the 
>document where the schema says 'this is an integer, so leading 
>zeroes in the decimal representation are irrelevant' means that the 
>schema author was wrong" really supports your argument, doesn't it?

Did you read what Wyman wrote? He was suggesting that we actually 
exchange four bytes containing a big endian two's complement 
representation of the number 7 (or some equivalent form), rather than 
exchanging the text string 7.

>>  All data in an XML document is text, never anything else.
>
>I can prove you wrong!
>
><numFingers>10</numFingers>
>
>That "10" is text. It's also an integer. It's also the number of 
>fingers I have. See? A counter-example.

Not a counter example at all, and when you understand that you will 
have achieved the XML nature. 10 is text, not a number. You choose to 
interpret that text string as the number ten, which is fine. It's 
your choice. Just don't believe for a minute that it's the only 
legitimate interpretation of that text string, or that the string and 
its interpretation are the same thing.

>>  It is certainly not something you'd want to exchange on the 
>>Internet, and it is is absolutely not something that should be 
>>baked into the core APIs.
>
>Oh, good! You understand! So what was all that rubbish before about, eh?
>
>Or are you mistaking an optional thing for something 'baked in'?

Simplicity is a virtue. We're trying to produce a Corvette here, not 
an Edsel. Use the right tools for the right tasks. Don't try to make 
one API fit all needs.

>Now, you are harping on about those who communicate information 
>rather than just opaque text as "polluting" XML, but don't you think 
>that demanding that the APIs *they* use be the same as the APIs 
>*you* use is... polluting *their* use of XML with *your* model, hmm?

Oh, come on. Now you're being ridiculous. They can invent and use any 
APIs (and any formats) they want. The problem is they don't want to 
do that. They want to hijack the nice clean SAX API and XML format, 
and stuff it full of mismatched garbage I'm going to have to spend my 
time explaining.

>  I mean, it's not like they're forcing SAX processers to report 
>abstract values instead of character strings, is it? If you read 
>carefully, you will see that it's an option. And you may have 
>noticed that there are lots of SAX processers that aren't doing this 
>in use *as we speak*. They won't be influenced by the magical rays 
>to change, will they? They will only change if the programmer that's 
>using them thinks "Hmmm, I'd like the leaf nodes of this XML parsed 
>into abstract values for me to save some coding, I'll use a SAX 
>parser that does that for me". If the programmer doesn't want that, 
>perhaps because the formatting of dates and integers is important to 
>them, then they won't do it.

If you want to translate data into XML to present it through SAX, 
fine. But don't start complexifying SAX because you discover your 
data isn't a good fit for XML.

>So shock horror! Even when USING this API that produces typed values 
>WHEN IT CAN, you could still get at the raw character stream to 
>handle it as you always did!
>
>So exactly how is this going to destroy XML, eh?

Two ways:

1. It will mean people start passing around binary data instead of text.

2. It will make XML so complex that it becomes incredibly difficult 
to learn and implement. Soon we'll be back in the SGML hell where no 
parser implements everything, and you're never quite sure which 
features you can and cannot use. And thus you can no longer safely 
interchange XML with other parties.  As Simon keeps pointing out, 
schemas, XPath 2, and XSLT 2 have already marched a long way down 
this road.  I don't think it's a coincidence that those of use who 
spend the largest part of our time trying to explain and teach these 
technologies are most adamant that this is the wrong road to follow.

-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA

Follow-Ups:
- Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)
  - From: Alaric B Snell <alaric@alaric-snell.com>

References:
- RE: [xml-dev] SAX for Binary Encodings (SAD-SAX)
  - From: "Bob Wyman" <bob@wyman.us>
- RE: [xml-dev] SAX for Binary Encodings (SAD-SAX)
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)
  - From: Alaric B Snell <alaric@alaric-snell.com>

Prev by Date: Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)
Next by Date: Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)
Previous by thread: Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)
Next by thread: Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)
Index(es):
- Date
- Thread