[
Lists Home |
Date Index |
Thread Index
]
On 25 Jul 2003 07:11:51 +1000, Rick Marshall <rjm@zenucom.com> wrote:
> if it's any help there was a similar debate about storing data in
> database systems years ago - do you store data as binary - integers,
> floats, etc - or text.
XML per se has no notion of integers, floats, dates, etc. ... you need to
apply a schema to infer that. The current "binary XML" schemes that
Elliotte ranted about mostly require a schema to work (causing all the
"tight coupling problems" that we love to discuss here), and don't AFAIK
get huge advantages for the very reasons you mention. "Binary XML" is
definitely a bad term, if not an oxymoron IMHO, because it implies that it
is about "compiling" schema-valid XML documents into architecture-specific
formats. The problems with that would sound like a catalog of XML-DEV
permathreads! I think a better way to think about at least what *I* am
interested in is "performance-optimized Infoset serialization." (POIS?)
That could cover a lot of possibilities, and potentially could be
essentially textual but faster to parse.
> i agree with the slow down from parsing xml. it's a much bigger problem,
> than binary or text formats. the need to find the other end of a tag
> before you can really process a tag - and searching for multi byte
> sequences is not well supported in the current generation of processors
> - is i think the main problem.
Absolutely! I'm not sure that exactly the bottlenecks are across the
board, and for all I know using something like "}" to denote the end of an
element could speed things up so that the multi-byte comparison isn't
necessary. I also hear repeatedly that the Unicode encoding/decoding step
is a real bottleneck and that something as simple as sending around UCS
characters rather than UTF codepoints can make a lot of difference. LOTS
of profiling would need to be done before an alternate serialization should
be standardized, of course.
>
> perhaps we can get intel to design multi byte search instructions into
> their next processor and then we can get performance back.
Well, there are people out there building XML support into hardware, at the
box level, board level, and chip level. There might be some synergies
between the hardware stuff and the "efficient serialization" stuff, and
further synergies if the downstream processing (e.g. XSLT) can be speeded
up by working of something other than raw XML 1.0 text. See, for example,
http://www.sarvega.com/sarvega.php?id=1.4 , especially their "specialized
data stream called XML EventStream to provide a highly optimized pipeline-
processing model for XML Processing." Standardizing some more efficient
serialization of the Infoset could (again if the numbers actually work out,
which remains to be seen) allow interoperability between specialized
hardware devices that parse/serialize between "POIS" and XML, and software
(e.g. front ends to XSLT engines or "POIS" -> SAX event filters). Without
standardization of some faster Infoset serialization, all this stuff works
only for those who will stay within a single vendor's castle.
Anyway, the point here is simply that "Binary XML" covers all sorts of
territory, from just a standard serialization for SAX events to a full-
blown strongly-typed object serialization format, and probably intersects
with ASN.1 along the way.
|