xml-dev - Re: [xml-dev] MSXML DOM Special Chars Less Than 32

Re: [xml-dev] MSXML DOM Special Chars Less Than 32

[ Lists Home | Date Index | Thread Index ]

To: Tim Bray <tbray@textuality.com>
Subject: Re: [xml-dev] MSXML DOM Special Chars Less Than 32
From: Amelia A Lewis <amyzing@talsever.com>
Date: 23 Mar 2002 12:35:42 -0500
Cc: Julian Reschke <julian.reschke@gmx.de>, xml-dev@lists.xml.org
In-reply-to: <3C9CB8C4.6030404@textuality.com>
References: <JIEGINCHMLABHJBIGKBCOEFAEEAA.julian.reschke@gmx.de><1016902883.631.18.camel@marajen> <3C9CB8C4.6030404@textuality.com>

On Sat, 2002-03-23 at 12:17, Tim Bray wrote:
> Amelia A Lewis wrote:
> 
> > In short, the C0 characters have no universal interpretation;
> > interpretation depends upon the application.  It seems reasonable, then,
> > that the application can encode the bloody things too.  Can't use XML
> > mechanisms.  Base64, the usual suggestion, incurs an immense overhead.
> 
> 
> I agree with the leading sentences.  As for the last, Base64 encodes
> 3 bytes as 4, thus incurring exactly 33% overhead.  Whether that
> is considered "immense" depends on your application scenario. -Tim

A little more than that, actually, in a correct base64 implementation.

Each 57 bytes become 76 bytes.  Add two more for CRLF.  Plus the final
padding, which is generally but not always negligible.  Lessee ... I'd
do the math, but I'm not working today, so it's lazy time: original +
1/3 + 1/57.  For decoding, 1 + 1 + 1/3 + 1/57, most likely, as you
prolly can't discard as you decode.

If "immense" is overwrought, could we agree on "significant"?  Tricks
like quoted-printable and encoded-word (and XML unicode numeric
entities) are attractive largely because the characters they encode are
*rarely* encountered, meaning that the cost is significantly less than
base64.

Amy!
(who's spent the last two weeks writing MIME-related code, and is
probably being hideously pedantic)
-- 
Amelia A. Lewis       amyzing@talsever.com      alicorn@mindspring.com
Yankees are compelled by some mysterious force to imitate Southern 
accents and they're so damn dumb they don't know the difference beween
a Tennessee drawl and a Charleston clip.
                -- Rita Mae Brown, "Rubyfruit Jungle"

This is a digitally signed message part

References:
- RE: [xml-dev] MSXML DOM Special Chars Less Than 32
  - From: "Julian Reschke" <julian.reschke@gmx.de>
- RE: [xml-dev] MSXML DOM Special Chars Less Than 32
  - From: Amelia A Lewis <amyzing@talsever.com>
- Re: [xml-dev] MSXML DOM Special Chars Less Than 32
  - From: Tim Bray <tbray@textuality.com>

Prev by Date: Re: [xml-dev] MSXML DOM Special Chars Less Than 32
Next by Date: Re: [xml-dev] node identity in XQuery
Previous by thread: Re: [xml-dev] MSXML DOM Special Chars Less Than 32
Next by thread: Re: [xml-dev] MSXML DOM Special Chars Less Than 32
Index(es):
- Date
- Thread