Lists Home |
Date Index |
If binary encodings for XML data are to be supported, it will be
very important to ensure that programmers can switch from textual XML
to binary encodings as transparently as possible. This will be
necessary if only to preserve investment in the tremendous number of
tools that process XML today as well as to reduce the complexity of
implementation for people building new tools.
Fortunately, it appears that Objective Systems already provides a
SAX-like interface for ASN.1 defined binary encodings and OSS Nokalva
is working on a SAX interface for ASN.1 defined encodings. There is a
paper that talks about the Objective Systems implementation at:
Reading the paper, it becomes clear why Objective Systems'
implementation is "SAX-like" rather than being simply "SAX." The
issues that caused them to diverge from plain-vanilla SAX are ones
that will need to be dealt with by any project which attempts to build
SAX support for binary encodings -- whether they are the ASN.1 defined
encodings or some new-fangled "binary XML" which might get invented
(but hopefully not patented...)
SAX assumes that the data it is reading is all of "character"
type. SAX has no access to a schema file and thus has no idea of the
real type of the characters it reads. However, a binary encoding will
typically pass data in a form more appropriate for its type. Thus, an
integer will be passed as something like a 32-bit value, not a string
of characters. So, to build a "pure SAX" interface to a binary
encoding, you would have to convert all the binary values to
characters before passing them to the SAX event handlers. Of course,
the first thing a lot of event handlers will do is convert the strings
back into binary types like integers. The result is, of course, often
wasteful silliness. It causes performance problems, memory
fragmentation due to all the string allocations, etc.
So, while SAX interfaces to binary encodings allow us to do
things like use XSLT processors on binary data, they also raise some
issues about the SAX interface itself. Knowing that people are already
implementing SAX interfaces to binary data, it probably makes sense to
carefully consider at this point how to handle this problem so that
standardized solutions can be implemented. That would be much better
than each developer or vendor coming up with their own interpretation
of what SAX for binary encodings looks like.
There are a number of possible solutions:
1. Insist that all binary types be converted to strings.
2. Define an "extended SAX" that passes data with descriptors
that show the type of data. Thus, a program would be told the type of
the data being passed and could call a ToString or ToInteger function
3. Provide a "mode switch" which could be called to modify the
behaviour of SAX. If "StringsOnly" was set to true, then only strings
would be generated, if "allowTypes" was set, then types would be
passed by descriptor.
4. Other options?
What makes the most sense here?