OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)

[ Lists Home | Date Index | Thread Index ]

010? 8 fingers, sorry about your loss

interpretation of leading zeroes is problematic as is 32 bit integers as
we move to a 64 bit world

and just for the record 10 is not a number. it's two symbols that we in
the west with an arabic number system commonly interpret as the number
10 in the absence of any other information. however it could be an
invoice number in which case it's meaningless to use any sort of binary
coding, or a zip/post code, etc

i like text encodings and will keep defending them. you can just do so
much more with them.


On Sun, 2003-11-09 at 11:10, Alaric B Snell wrote:
> Elliotte Rusty Harold wrote:
> > At 11:22 AM -0500 11/8/03, Bob Wyman wrote:
> > 
> > 
> >>     Frankly, I'm a bit surprised by all the heat that has been
> >> generated by what I thought was a very simple proposal. Clearly there
> >> is something I'm not understanding... Please bear with me as I try to
> >> be more explicit about what I'm suggesting. Then, *PLEASE* let me know
> >> what it is that I'm not understanding...
> > 
> > 
> > It is not at all simple. What you are proposing is a bullet aimed at the 
> > heart of XML. There is no binary data in XML. None. Zip. Zero. Zilch. 
> > Nada. There is no data except text.
> It's not about "binary", it's about "typed". And although there is a 
> typeless subset of XML - where no schemas or DTDs are used - there is a 
> significant number of people who do use typed information of one kind or 
> another. And despite they've created DTDs, XML Schema, XPath 2, and so 
> on, you can still do your non-constrained XML transfers of sequences of 
> characters if you want to; they've not polluted it in any way, OK? So 
> stop panicking, man!
> > You are proposing to add binary data to the core of XML. When you tell 
> > me that you want to pass 32-bit integers as pure binary, then you are 
> > saying that 7, 07, +07, 0007, and so forth are the same thing. They are 
> > not.
> In some contexts they are; in some they are not. It all depends on who 
> is viewing it. If you stop and think, you will realise that an API like 
> SAX in no way *forces* 7 to be treated identically to 07. Clearly you 
> don't know a thing about how computers work if you think that *some* 
> pieces of software treating 7 the same as 07 *if they wish to* will 
> somehow emit data-destroying rays that spread across the Internet and 
> lop the significant leading zeroes off of telephone numbers, eh?
> You might be surprised to find that a lot of software *does* take 
> character strings out of an XML document and immediately parse them as 
> decimal integers. This doesn't appear to have broken XML, does it? Shock 
> horror!
> If the thought of XML processors discarding information that is not 
> relevant to them bothers you that much, you really need to try and 
> destroy the computer industry - those pesky binary computers are always 
> destroying XML, aren't they? Just write XML on paper instead. But you'd 
> better not let people look at it who hadn't been trained with the self 
> control to look at "07" without thinking "Oh, seven", because they would 
> destroy XML by doing so, wouldn't they? ;-)
>  > The data model on which your SAD SAX API is based makes assumptions
> > about XML that are not true for many people. You are restricting XML to 
> > one data model, and claiming other data models are irrelevant.
> Yes, the bit where he said that "This SAX option would be compulsory, 
> and all XML parsers would be international agreement be required to have 
> this option turned on; software that does not turn this option on would 
> not be allowed to be sold, or written, because I know full well that a 
> character string of '07' at a point in the document where the schema 
> says 'this is an integer, so leading zeroes in the decimal 
> representation are irrelevant' means that the schema author was wrong" 
> really supports your argument, doesn't it?
> > All data in an XML document is text, never anything else.
> I can prove you wrong!
> <numFingers>10</numFingers>
> That "10" is text. It's also an integer. It's also the number of fingers 
> I have. See? A counter-example.
> Thing is:
> <numFingers>010</numFingers>
> The 010 is also text. It's also an integer. It's also the number of 
> fingers I have, too! It's a different string... but it's the SAME 
> INTEGER! There is NO DIFFERENCE between me having ten fingers and me 
> having ten fingers, and both snippets say exactly that - I have ten 
> fingers. One says it by stating that I have 10 fingers, the other by 
> stating that I have 010 fingers, but you know what? Although they take 
> two different approaches to saying it, they say the SAME THING.
> Now, these two snippets say totally different things:
> <password>10</password>
> and
> <password>010</password>
> ...yet somehow, I manage to reconcile this fact in my head with the fact 
> that the other two snippets made identical statements. Wow! I must be... 
> a genius or something!
>  > How that text
> > is instantiated into locally useful data structures is a local decision. 
> Ah, you're cottoning on!
> > It is certainly not something you'd want to exchange on the Internet, 
> > and it is is absolutely not something that should be baked into the core 
> > APIs.
> Oh, good! You understand! So what was all that rubbish before about, eh?
> Or are you mistaking an optional thing for something 'baked in'?
> Now, you are harping on about those who communicate information rather 
> than just opaque text as "polluting" XML, but don't you think that 
> demanding that the APIs *they* use be the same as the APIs *you* use 
> is... polluting *their* use of XML with *your* model, hmm? I mean, it's 
> not like they're forcing SAX processers to report abstract values 
> instead of character strings, is it? If you read carefully, you will see 
> that it's an option. And you may have noticed that there are lots of SAX 
> processers that aren't doing this in use *as we speak*. They won't be 
> influenced by the magical rays to change, will they? They will only 
> change if the programmer that's using them thinks "Hmmm, I'd like the 
> leaf nodes of this XML parsed into abstract values for me to save some 
> coding, I'll use a SAX parser that does that for me". If the programmer 
> doesn't want that, perhaps because the formatting of dates and integers 
> is important to them, then they won't do it.
> You might have got a silly idea into your head that perhaps a generic 
> XML processing tool - say, a tool that updates a lastModifed timestamp 
> stored as a namespaced attribute of the document element, that can be 
> used to track a modification history for any XML document - would 
> somehow be forced into using an abstract-value-reporting SAX processor 
> to read in the document, even though the programmer writing this thing - 
> if she or he had intelligence above that of a potato - would have 
> thought "Hmmm, I've no idea what the document means outside of my little 
> attribute on the root element; I'd better pass it through verbatim. I 
> don't even know where whitespace is significant!" and thus configured 
> his SAX parser with knowledge only of the type of the lastModified 
> element. As such, the SAX parser would not know any type information 
> about the other elements (what if it encounters an xsi:type attribute on 
> something? Interesting - perhaps the developer should instruct the SAX 
> processor not to trust them, since they may be wrong - or maybe the 
> presence of an xsi:type declaration on some content in the document 
> should be taken to infer the author's permission to normalise that 
> content. I dunno. It's irrelevant to this discussion anyway), and would 
> just pass through the unchanged character data, which the programmer 
> would pass straight through, verbatim.
> So shock horror! Even when USING this API that produces typed values 
> WHEN IT CAN, you could still get at the raw character stream to handle 
> it as you always did!
> So exactly how is this going to destroy XML, eh?
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
> !DSPAM:3fad866e37601052931016!


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS