OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Binary XML == "spawn of the devil" ?

[ Lists Home | Date Index | Thread Index ]

On 25 Jul 2003 07:11:51 +1000, Rick Marshall <rjm@zenucom.com> wrote:


> if it's any help there was a similar debate about storing data in
> database systems years ago - do you store data as binary - integers,
> floats, etc - or text.

XML per se has no notion of integers, floats, dates, etc. ... you need to 
apply a schema to infer that.  The current "binary XML" schemes that 
Elliotte ranted about mostly require a schema to work (causing all the 
"tight coupling problems" that we love to discuss here), and don't AFAIK 
get huge advantages for the very reasons you mention.  "Binary XML" is 
definitely a bad term, if not an oxymoron IMHO, because it implies that it 
is about "compiling" schema-valid XML documents into architecture-specific 
formats.  The problems with that would sound like a catalog of XML-DEV 
permathreads!  I think a better way to think about at least what *I* am 
interested in is "performance-optimized Infoset serialization."  (POIS?) 
That could cover a lot of possibilities, and potentially could be 
essentially textual but faster to parse.


> i agree with the slow down from parsing xml. it's a much bigger problem,
> than binary or text formats. the need to find the other end of a tag
> before you can really process a tag - and searching for multi byte
> sequences is not well supported in the current generation of processors
> - is i think the main problem.

Absolutely!  I'm not sure that exactly the bottlenecks are across the 
board, and for all I know using something like "}" to denote the end of an 
element could speed things up so that the multi-byte comparison isn't 
necessary.  I also hear repeatedly that the Unicode encoding/decoding step 
is a real bottleneck and that something as simple as sending around UCS 
characters rather than UTF codepoints can make a lot of difference.  LOTS 
of profiling would need to be done before an alternate serialization should 
be standardized, of course.

>
> perhaps we can get intel to design multi byte search instructions into
> their next processor and then we can get performance back.

Well, there are people out there building XML support into hardware, at the 
box level, board level, and chip level.  There might be some synergies 
between the hardware stuff and the "efficient serialization" stuff, and 
further synergies if the downstream processing (e.g. XSLT) can be speeded 
up by working of something other than raw XML 1.0 text.  See, for example, 
http://www.sarvega.com/sarvega.php?id=1.4 , especially their  "specialized 
data stream called XML EventStream to provide a highly optimized pipeline- 
processing model for XML Processing."  Standardizing some more efficient 
serialization of the Infoset could (again if the numbers actually work out, 
which remains to be seen) allow interoperability between specialized 
hardware devices that parse/serialize between "POIS" and XML,  and software 
(e.g. front ends to XSLT engines or "POIS" -> SAX event filters).  Without 
standardization of some faster Infoset serialization, all this stuff works 
only for those who will stay within a single vendor's castle.

Anyway, the point here is simply that "Binary XML" covers all sorts of 
territory, from just a standard serialization for SAX events to a full- 
blown strongly-typed object serialization format, and probably intersects 
with ASN.1 along the way.    




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS