OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Pushing all the buttons

[ Lists Home | Date Index | Thread Index ]

Mike Champion wrote:

>As best I know, the big win for truly binary XML
>serializations is in avoiding the overhead of the
>Unicode-encoded text to UCS-character translation. 
>Does anyone take issue with the assertion that the
>external encoding-> Unicode text translation is
>generally a significant portion of XML parsing time?  
>  
>
Yes?  Transcoding ASCII, ISO8859-1 or UTF-16 is just a cast;
translating UTF-8 is a tiny automaton, easily enough to fit into
a data cache; translating most 8-bit sets needs only a 94 byte table.
There is nothing intrinsic to any of them that should make them
slow, the code to do them could fit into instruction caches on CPUs
(which is surely what people who want speed should be concentrating on:
what is the most functionality that a standard can prescribe that still
fits into caches):  it reckon it should be more an API/implementation 
issue.*

Java 1.4 NIO has completely revised their character transcoding:
you can have transcoders that autodetect, so I don't know why
someone doesn't put out an XML-autodetecting transcoder, which
would operate directly on, for example, external byte buffers. That
could give much nicer streaming performance.  (Anyone have any
benchmarks for NIO b.t.w.?)

The CJK sets, EBCDIC, perhaps encodings with ordering requirements such
as Thai, and older sets which need normalization are a different matter:
they are not casts, simple automata nor little tables. But removing these
from XML will not result in any extra capability for users: if you need 
speed,
send easy data.

Cheers
Rick Jelliffe

* For example, I found that IBM's ICU4J normalization class was way too 
slow
when  presented with ASCII data; but a trivial matter to bypass.





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS