Lists Home |
Date Index |
At 2:03 PM -0500 11/7/03, Bob Wyman wrote:
> SAX assumes that the data it is reading is all of "character"
>type. SAX has no access to a schema file and thus has no idea of the
>real type of the characters it reads.
That's because SAX is designed to read XML, not the binary, non-XML
formats people like to dream up.
>However, a binary encoding will
>typically pass data in a form more appropriate for its type. Thus, an
>integer will be passed as something like a 32-bit value, not a string
Then whatever it is, it certainly isn't XML. XML is text. If you want
to pass around binary data feel free, but please don't pretend this
is XML or even a binary encoding of XML. It's not, and you shouldn't
expect XML tools to work on it.
People keep saying they want a binary encoding of XML but when the
pedal hits the metal, they keep proposing formats which do not have
1-1, onto mappings to XML.
>So, to build a "pure SAX" interface to a binary
>encoding, you would have to convert all the binary values to
>characters before passing them to the SAX event handlers.
There are no binary values in XML, period. When I write
I read the string 4.3, which I can then interpret as I want. I am not
limited to interpreting it as an inexact, 32-bit, IEEE 754 float. I
can if I want to but I don't have to. Even with a schema THERE IS NO
BINARY DATA IN XML. ONLY TEXT.
> Of course,
>the first thing a lot of event handlers will do is convert the strings
>back into binary types like integers. The result is, of course, often
>wasteful silliness. It causes performance problems, memory
>fragmentation due to all the string allocations, etc.
The first thing a lot of event handlers will do is convert the
strings back into binary types like integers *that are appropriate
for their local data models*. This may be a two byte int, a four-byte
int, a float, a double, a java.lang.BigInteger or something else.
There is no one unique and correct answer here. You're trying to
impose a single representation of the data on all the different
processes that may wish to process it. That simply won't fly in a
heterogenous, Internet environment. One size does not fit all.
> So, while SAX interfaces to binary encodings allow us to do
>things like use XSLT processors on binary data, they also raise some
>issues about the SAX interface itself. Knowing that people are already
>implementing SAX interfaces to binary data, it probably makes sense to
>carefully consider at this point how to handle this problem so that
>standardized solutions can be implemented. That would be much better
>than each developer or vendor coming up with their own interpretation
>of what SAX for binary encodings looks like.
I disagree completely. Please don't pollute SAX with your binary
data. If you need a binary API for non-XML, binary formats, you're
free to create one. But if you can't get your binary format and API
off the ground without misappropriating the SAX and XML brands, then
I suspect there's something fundamentally wrong with your format.
> There are a number of possible solutions:
> 1. Insist that all binary types be converted to strings.
> 2. Define an "extended SAX" that passes data with descriptors
>that show the type of data. Thus, a program would be told the type of
>the data being passed and could call a ToString or ToInteger function
> 3. Provide a "mode switch" which could be called to modify the
>behaviour of SAX. If "StringsOnly" was set to true, then only strings
>would be generated, if "allowTypes" was set, then types would be
>passed by descriptor.
> 4. Other options?
> What makes the most sense here?
5. Stop pretending binary data is XML. It isn't, and it isn't going
to pass easily into XML tools. Either accept text, or define your own
formats and APIs. Stop trying to pollute XML.
Elliotte Rusty Harold
Effective XML (Addison-Wesley, 2003)