Lists Home |
Date Index |
Elliotte Rusty Harold wrote:
> At 11:22 AM -0500 11/8/03, Bob Wyman wrote:
>> Frankly, I'm a bit surprised by all the heat that has been
>> generated by what I thought was a very simple proposal. Clearly there
>> is something I'm not understanding... Please bear with me as I try to
>> be more explicit about what I'm suggesting. Then, *PLEASE* let me know
>> what it is that I'm not understanding...
> It is not at all simple. What you are proposing is a bullet aimed at the
> heart of XML. There is no binary data in XML. None. Zip. Zero. Zilch.
> Nada. There is no data except text.
It's not about "binary", it's about "typed". And although there is a
typeless subset of XML - where no schemas or DTDs are used - there is a
significant number of people who do use typed information of one kind or
another. And despite they've created DTDs, XML Schema, XPath 2, and so
on, you can still do your non-constrained XML transfers of sequences of
characters if you want to; they've not polluted it in any way, OK? So
stop panicking, man!
> You are proposing to add binary data to the core of XML. When you tell
> me that you want to pass 32-bit integers as pure binary, then you are
> saying that 7, 07, +07, 0007, and so forth are the same thing. They are
In some contexts they are; in some they are not. It all depends on who
is viewing it. If you stop and think, you will realise that an API like
SAX in no way *forces* 7 to be treated identically to 07. Clearly you
don't know a thing about how computers work if you think that *some*
pieces of software treating 7 the same as 07 *if they wish to* will
somehow emit data-destroying rays that spread across the Internet and
lop the significant leading zeroes off of telephone numbers, eh?
You might be surprised to find that a lot of software *does* take
character strings out of an XML document and immediately parse them as
decimal integers. This doesn't appear to have broken XML, does it? Shock
If the thought of XML processors discarding information that is not
relevant to them bothers you that much, you really need to try and
destroy the computer industry - those pesky binary computers are always
destroying XML, aren't they? Just write XML on paper instead. But you'd
better not let people look at it who hadn't been trained with the self
control to look at "07" without thinking "Oh, seven", because they would
destroy XML by doing so, wouldn't they? ;-)
> The data model on which your SAD SAX API is based makes assumptions
> about XML that are not true for many people. You are restricting XML to
> one data model, and claiming other data models are irrelevant.
Yes, the bit where he said that "This SAX option would be compulsory,
and all XML parsers would be international agreement be required to have
this option turned on; software that does not turn this option on would
not be allowed to be sold, or written, because I know full well that a
character string of '07' at a point in the document where the schema
says 'this is an integer, so leading zeroes in the decimal
representation are irrelevant' means that the schema author was wrong"
really supports your argument, doesn't it?
> All data in an XML document is text, never anything else.
I can prove you wrong!
That "10" is text. It's also an integer. It's also the number of fingers
I have. See? A counter-example.
The 010 is also text. It's also an integer. It's also the number of
fingers I have, too! It's a different string... but it's the SAME
INTEGER! There is NO DIFFERENCE between me having ten fingers and me
having ten fingers, and both snippets say exactly that - I have ten
fingers. One says it by stating that I have 10 fingers, the other by
stating that I have 010 fingers, but you know what? Although they take
two different approaches to saying it, they say the SAME THING.
Now, these two snippets say totally different things:
...yet somehow, I manage to reconcile this fact in my head with the fact
that the other two snippets made identical statements. Wow! I must be...
a genius or something!
> How that text
> is instantiated into locally useful data structures is a local decision.
Ah, you're cottoning on!
> It is certainly not something you'd want to exchange on the Internet,
> and it is is absolutely not something that should be baked into the core
Oh, good! You understand! So what was all that rubbish before about, eh?
Or are you mistaking an optional thing for something 'baked in'?
Now, you are harping on about those who communicate information rather
than just opaque text as "polluting" XML, but don't you think that
demanding that the APIs *they* use be the same as the APIs *you* use
is... polluting *their* use of XML with *your* model, hmm? I mean, it's
not like they're forcing SAX processers to report abstract values
instead of character strings, is it? If you read carefully, you will see
that it's an option. And you may have noticed that there are lots of SAX
processers that aren't doing this in use *as we speak*. They won't be
influenced by the magical rays to change, will they? They will only
change if the programmer that's using them thinks "Hmmm, I'd like the
leaf nodes of this XML parsed into abstract values for me to save some
coding, I'll use a SAX parser that does that for me". If the programmer
doesn't want that, perhaps because the formatting of dates and integers
is important to them, then they won't do it.
You might have got a silly idea into your head that perhaps a generic
XML processing tool - say, a tool that updates a lastModifed timestamp
stored as a namespaced attribute of the document element, that can be
used to track a modification history for any XML document - would
somehow be forced into using an abstract-value-reporting SAX processor
to read in the document, even though the programmer writing this thing -
if she or he had intelligence above that of a potato - would have
thought "Hmmm, I've no idea what the document means outside of my little
attribute on the root element; I'd better pass it through verbatim. I
don't even know where whitespace is significant!" and thus configured
his SAX parser with knowledge only of the type of the lastModified
element. As such, the SAX parser would not know any type information
about the other elements (what if it encounters an xsi:type attribute on
something? Interesting - perhaps the developer should instruct the SAX
processor not to trust them, since they may be wrong - or maybe the
presence of an xsi:type declaration on some content in the document
should be taken to infer the author's permission to normalise that
content. I dunno. It's irrelevant to this discussion anyway), and would
just pass through the unchanged character data, which the programmer
would pass straight through, verbatim.
So shock horror! Even when USING this API that produces typed values
WHEN IT CAN, you could still get at the raw character stream to handle
it as you always did!
So exactly how is this going to destroy XML, eh?