[
Lists Home |
Date Index |
Thread Index
]
On Thu, 2002-01-10 at 16:24, David Brownell wrote:
> > It might be worth noting the current discussion on xml-dev (or content
> > thereof) regarding surrogate pairs, as SAX relies on the Java char and
> > String constructs throughout.
>
> I'll catch up on that, but my advice on that point is unlikely to
> change. As I've pointed out in an upcoming O'Reilly book
> (you might have heard about it, called "SAX2" ... ;-) surrogate
> pairs aren't the only place that a Java "char" doesn't match
> a "character" ... there are also composed characters to
> worry about, even in the absence of surrogate pairs.
Sure thing, all advertising for our joint projects aside...
I just suspect the point's worth making a little more strongly, as so
many of us have been brainwashed to think Java char=Unicode character.
Surrogate pairs whacked me a lot harder over the head than I thought,
and Java doesn't seem to take note.
> Point is that anyone working at the "character" level MUST
> NOT ASSUME that such characters consist only of a single
> Java "char" value. And that'd be true even if "char" were
> to make an incompatible change, and acquire a few extra
> bits at the left so that surrogates could in some cases be
> eliminated.
So could the paragraph above appear in the documentation somewhere? I
think that would take of all my concerns.
--
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com
|