OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)

[ Lists Home | Date Index | Thread Index ]

Elliotte Rusty Harold wrote:
> At 11:22 AM -0500 11/8/03, Bob Wyman wrote:
>>     Frankly, I'm a bit surprised by all the heat that has been
>> generated by what I thought was a very simple proposal. Clearly there
>> is something I'm not understanding... Please bear with me as I try to
>> be more explicit about what I'm suggesting. Then, *PLEASE* let me know
>> what it is that I'm not understanding...
> It is not at all simple. What you are proposing is a bullet aimed at the 
> heart of XML. There is no binary data in XML. None. Zip. Zero. Zilch. 
> Nada. There is no data except text.

It's not about "binary", it's about "typed". And although there is a 
typeless subset of XML - where no schemas or DTDs are used - there is a 
significant number of people who do use typed information of one kind or 
another. And despite they've created DTDs, XML Schema, XPath 2, and so 
on, you can still do your non-constrained XML transfers of sequences of 
characters if you want to; they've not polluted it in any way, OK? So 
stop panicking, man!

> You are proposing to add binary data to the core of XML. When you tell 
> me that you want to pass 32-bit integers as pure binary, then you are 
> saying that 7, 07, +07, 0007, and so forth are the same thing. They are 
> not.

In some contexts they are; in some they are not. It all depends on who 
is viewing it. If you stop and think, you will realise that an API like 
SAX in no way *forces* 7 to be treated identically to 07. Clearly you 
don't know a thing about how computers work if you think that *some* 
pieces of software treating 7 the same as 07 *if they wish to* will 
somehow emit data-destroying rays that spread across the Internet and 
lop the significant leading zeroes off of telephone numbers, eh?

You might be surprised to find that a lot of software *does* take 
character strings out of an XML document and immediately parse them as 
decimal integers. This doesn't appear to have broken XML, does it? Shock 

If the thought of XML processors discarding information that is not 
relevant to them bothers you that much, you really need to try and 
destroy the computer industry - those pesky binary computers are always 
destroying XML, aren't they? Just write XML on paper instead. But you'd 
better not let people look at it who hadn't been trained with the self 
control to look at "07" without thinking "Oh, seven", because they would 
destroy XML by doing so, wouldn't they? ;-)

 > The data model on which your SAD SAX API is based makes assumptions
> about XML that are not true for many people. You are restricting XML to 
> one data model, and claiming other data models are irrelevant.

Yes, the bit where he said that "This SAX option would be compulsory, 
and all XML parsers would be international agreement be required to have 
this option turned on; software that does not turn this option on would 
not be allowed to be sold, or written, because I know full well that a 
character string of '07' at a point in the document where the schema 
says 'this is an integer, so leading zeroes in the decimal 
representation are irrelevant' means that the schema author was wrong" 
really supports your argument, doesn't it?

> All data in an XML document is text, never anything else.

I can prove you wrong!


That "10" is text. It's also an integer. It's also the number of fingers 
I have. See? A counter-example.

Thing is:


The 010 is also text. It's also an integer. It's also the number of 
fingers I have, too! It's a different string... but it's the SAME 
INTEGER! There is NO DIFFERENCE between me having ten fingers and me 
having ten fingers, and both snippets say exactly that - I have ten 
fingers. One says it by stating that I have 10 fingers, the other by 
stating that I have 010 fingers, but you know what? Although they take 
two different approaches to saying it, they say the SAME THING.

Now, these two snippets say totally different things:




...yet somehow, I manage to reconcile this fact in my head with the fact 
that the other two snippets made identical statements. Wow! I must be... 
a genius or something!

 > How that text
> is instantiated into locally useful data structures is a local decision. 

Ah, you're cottoning on!

> It is certainly not something you'd want to exchange on the Internet, 
> and it is is absolutely not something that should be baked into the core 
> APIs.

Oh, good! You understand! So what was all that rubbish before about, eh?

Or are you mistaking an optional thing for something 'baked in'?

Now, you are harping on about those who communicate information rather 
than just opaque text as "polluting" XML, but don't you think that 
demanding that the APIs *they* use be the same as the APIs *you* use 
is... polluting *their* use of XML with *your* model, hmm? I mean, it's 
not like they're forcing SAX processers to report abstract values 
instead of character strings, is it? If you read carefully, you will see 
that it's an option. And you may have noticed that there are lots of SAX 
processers that aren't doing this in use *as we speak*. They won't be 
influenced by the magical rays to change, will they? They will only 
change if the programmer that's using them thinks "Hmmm, I'd like the 
leaf nodes of this XML parsed into abstract values for me to save some 
coding, I'll use a SAX parser that does that for me". If the programmer 
doesn't want that, perhaps because the formatting of dates and integers 
is important to them, then they won't do it.

You might have got a silly idea into your head that perhaps a generic 
XML processing tool - say, a tool that updates a lastModifed timestamp 
stored as a namespaced attribute of the document element, that can be 
used to track a modification history for any XML document - would 
somehow be forced into using an abstract-value-reporting SAX processor 
to read in the document, even though the programmer writing this thing - 
if she or he had intelligence above that of a potato - would have 
thought "Hmmm, I've no idea what the document means outside of my little 
attribute on the root element; I'd better pass it through verbatim. I 
don't even know where whitespace is significant!" and thus configured 
his SAX parser with knowledge only of the type of the lastModified 
element. As such, the SAX parser would not know any type information 
about the other elements (what if it encounters an xsi:type attribute on 
something? Interesting - perhaps the developer should instruct the SAX 
processor not to trust them, since they may be wrong - or maybe the 
presence of an xsi:type declaration on some content in the document 
should be taken to infer the author's permission to normalise that 
content. I dunno. It's irrelevant to this discussion anyway), and would 
just pass through the unchanged character data, which the programmer 
would pass straight through, verbatim.

So shock horror! Even when USING this API that produces typed values 
WHEN IT CAN, you could still get at the raw character stream to handle 
it as you always did!

So exactly how is this going to destroy XML, eh?



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS