OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Pushing all the buttons

[ Lists Home | Date Index | Thread Index ]

From: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>
> At 8:33 AM -0400 9/21/03, David Megginson wrote:
> >I think that James was talking about going from bytes representing a
> >Unicode character encoding, not a binary encoding.  There should be no
> >platform dependencies in that case.
> I understood that, and my point still holds. There are platform
> dependencies in this case. If the native char and string types are
> built on UTF-8 (Perl, maybe?)  then this is straightforward.,
> However, when the native char and string types are based on UTF-16 a
> conversion is necessary. Ditto for UTF-16BE to UTF-16LE and vice
> versa. Or UTF-8/UTF-16 --> UTF-32. Languages and platforms do not
> share the same internal representations of Unicode. No one binary
> format will work for everyone.

Your definition of "work" seems to be that parsing/translation overhead must
be essentially reduced to zero. It does not have to go to zero to be faster.

> This conversion is non-trivial too. In the current version of XOM I
> made deliberate decision after profiling to store internal text node
> data in UTF-8 rather than UTF-16. That saves me a *lot* of memory.
> However, the constant conversion to and from the internal UTF-8
> representation to Java's UTF-16 representation imposes about a 10%
> speed penalty. I chose to optimize for size instead of speed in this
> case, but I wouldn't suggest imposing that cost on everyone by making
> all XML data UTF-8.

A straw man. This is a translation overhead imposed on all runtime use of
strings, not an interchange cost. If the translation added only 10% to the
cost of interchange vs. simply streaming in/out the native representation,
I'd bet it would be much faster than going back and forth between internal
representation and XML.

Bob Foster


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS