OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)

[ Lists Home | Date Index | Thread Index ]

Elliotte Rusty Harold wrote:
> Like many people who want typed data you're confusing the local with the 
> global. Of course my software will treat the strings in a way I find 
> useful. However, that in now way means you have to treat them the same 
> way. You may want floats. I may want ints. Simon may want strings. 
> There's no one right answer.

I agree!

 > The underlying premise that suggests we
> should exchange typed, binary data is hat there is one right answer; one 
> type that's better for the data than all the others;

...and here I disagree. But this is orthogonal; the SAX extension 
proposal is purely a local processing issue, anyway. It affects nothing 
globally; the same bits are still exchanged on wires.

> Did you read what Wyman wrote? He was suggesting that we actually 
> exchange four bytes containing a big endian two's complement 
> representation of the number 7 (or some equivalent form), rather than 
> exchanging the text string 7.

Between the SAX parser and the rest of the application, yes. But not 
globally. Wyman was originally talking about using SAX for ASN.1 stuff 
(although I think a typed extension to SAX has wider application than 
that), and none of the ASN.1 encodings have arbitrary relationships with 
32 bit architectures or anything silly like that. Unless there's a 
validity constraint in the 'schema', numbers in ASN.1 encodings are 
constrained only by the available storage space on the disk you've put 
the file on...

>> <numFingers>10</numFingers>

> Not a counter example at all, and when you understand that you will have 
> achieved the XML nature. 10 is text, not a number. You choose to 
> interpret that text string as the number ten, which is fine. It's your 
> choice. Just don't believe for a minute that it's the only legitimate 
> interpretation of that text string, or that the string and its 
> interpretation are the same thing.

Ten isn't the only legitimate interpretation of "10", no; that's not my 
point. My point is that in the context of the numFingers element - which 
is defined (within its namespace) as containing the number of fingers 
the person in question has, written as a positive decimal integer - 
"ten" is the sole intended semantics of the string "10".

[it's just an option]
> Simplicity is a virtue. We're trying to produce a Corvette here, not an 
> Edsel. Use the right tools for the right tasks. Don't try to make one 
> API fit all needs.

That's the reason why it would be nice to have it as a SAX option.

I mean, the alternative is to have an API that's a direct copy of SAX 
apart from the leaf nodes being reported as Java class 'Object' or 
whatever rather than as a string - meaning that we now have another API 
that can handle the same stuff as SAX, but can do other things too. This 
would fragment the community unnecessarily, by just piling more features 
into the core rather than having them as extra modules you can plug in.

>> Now, you are harping on about those who communicate information rather 
>> than just opaque text as "polluting" XML, but don't you think that 
>> demanding that the APIs *they* use be the same as the APIs *you* use 
>> is... polluting *their* use of XML with *your* model, hmm?
> Oh, come on. Now you're being ridiculous. They can invent and use any 
> APIs (and any formats) they want. The problem is they don't want to do 
> that. They want to hijack the nice clean SAX API and XML format, and 
> stuff it full of mismatched garbage I'm going to have to spend my time 
> explaining.

No they don't! Who's trying to add things to the SAX API or XML format?

Writing a SAX option doesn't mean adding anything to SAX at all. Writing 
a SAX option doesn't mean having to add anything; the typed SAX 
extension need never be part of the SAX API, since it's an *extension*. 
You would use the SAX API to find out if the typed API was available:


...which would throw a SAXNot RecognizedException if the driver you're 
using didn't know about typed values.

But if it did support it, you could call reader.setFeature ("...URI 
denoting typed SAX extensions..."), thus informing the parser that the 
ContentHandler instance provided by the application also supports the 
TypedContentHandler interface, which just adds a few methods for typed 
data (the value() callback and presumably some replacement for 
startElement to handle elements with typed attributes - although the 
latter may not be necessary, perhaps part of the extension could be that 
the client application is free to cast the instance of class Attributes 
passed into startElement to class TypedAttributes which has an extra 
method for getting typed attribute values).

At no point does this require changing anything in the SAX API.

>> So exactly how is this going to destroy XML, eh?
> Two ways:
> 1. It will mean people start passing around binary data instead of text.

They already do that... look at all the images on web pages :-)

> 2. It will make XML so complex that it becomes incredibly difficult to 
> learn and implement.

Why must XML change? I don't see how this will change XML...

 > Soon we'll be back in the SGML hell where no parser
> implements everything, and you're never quite sure which features you 
> can and cannot use.

Ahah! That's more like a valid point. Yes, it would be bad if your 
application was written to use typed SAX in order to remove the burden 
of doing all the parsing of date formats and so on, but then you found 
yourself having to compile it on a platform with no SAX parser that 
implemented the extension... the solution to this is to have nice open 
source implementations that quickly get ported everywhere :-)

 > And thus you can no longer safely interchange XML
> with other parties. 

This doesn't affect the interchange of XML, however; there's nothing 
about typed SAX, as I see it at least, that changes XML in any way.

 > As Simon keeps pointing out, schemas, XPath 2, and
> XSLT 2 have already marched a long way down this road.  I don't think 
> it's a coincidence that those of use who spend the largest part of our 
> time trying to explain and teach these technologies are most adamant 
> that this is the wrong road to follow.

As I said in my reply to Simon's posting, I don't agree with how XML 
Schema, XPath 2, and XSLT have been done myself... I think the W3C has 
failed to consider the implications of its actions, in some respects.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS