xml-dev - Re: [xml-dev] Pushing all the buttons

Re: [xml-dev] Pushing all the buttons

[ Lists Home | Date Index | Thread Index ]

To: David Megginson <dmeggin@attglobal.net>,XML Dev <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Pushing all the buttons
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Sun, 21 Sep 2003 09:59:32 -0400
In-reply-to: <16237.39556.411199.940204@megginson.com>
References: <r02000000-1026-8D8DE86FEB8311D7B3530003937A08C2@[192.168.124.11]><3F6C8877.5070701@propylon.com> <3F6CFBCC.50009@jclark.com><p06002002bb9342176f4f@[192.168.254.4]><16237.39556.411199.940204@megginson.com>

At 8:33 AM -0400 9/21/03, David Megginson wrote:

>I think that James was talking about going from bytes representing a
>Unicode character encoding, not a binary encoding.  There should be no
>platform dependencies in that case.

I understood that, and my point still holds. There are platform 
dependencies in this case. If the native char and string types are 
built on UTF-8 (Perl, maybe?)  then this is straightforward., 
However, when the native char and string types are based on UTF-16 a 
conversion is necessary. Ditto for UTF-16BE to UTF-16LE and vice 
versa. Or UTF-8/UTF-16 --> UTF-32. Languages and platforms do not 
share the same internal representations of Unicode. No one binary 
format will work for everyone.

This conversion is non-trivial too. In the current version of XOM I 
made deliberate decision after profiling to store internal text node 
data in UTF-8 rather than UTF-16. That saves me a *lot* of memory. 
However, the constant conversion to and from the internal UTF-8 
representation to Java's UTF-16 representation imposes about a 10% 
speed penalty. I chose to optimize for size instead of speed in this 
case, but I wouldn't suggest imposing that cost on everyone by making 
all XML data UTF-8.

-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Processing XML with Java (Addison-Wesley, 2002)
   http://www.cafeconleche.org/books/xmljava
   http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA

Follow-Ups:
- Re: [xml-dev] Pushing all the buttons
  - From: "Bob Foster" <bob@objfac.com>

References:
- Re: [xml-dev] Pushing all the buttons
  - From: "Simon St.Laurent" <simonstl@simonstl.com>
- Re: [xml-dev] Pushing all the buttons
  - From: Bill de hÓra <bill.dehora@propylon.com>
- Re: [xml-dev] Pushing all the buttons
  - From: James Clark <jjc@jclark.com>
- Re: [xml-dev] Pushing all the buttons
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] Pushing all the buttons
  - From: David Megginson <dmeggin@attglobal.net>

Prev by Date: Re: [xml-dev] Pushing all the buttons
Next by Date: Re: [xml-dev] ISO turns evil?
Previous by thread: Re: [xml-dev] Pushing all the buttons
Next by thread: Re: [xml-dev] Pushing all the buttons
Index(es):
- Date
- Thread