Lists Home |
Date Index |
> I just suspect the point's worth making a little more strongly, as so
> many of us have been brainwashed to think Java char=Unicode character.
> Surrogate pairs whacked me a lot harder over the head than I thought,
> and Java doesn't seem to take note.
True for most folk. XML made me get my hands dirty with
I18N stuff, and that one took a while for me to grok. I don't
think it'll be intuitive to most folk, who've rarely had to look
at such I18N issues.
> > Point is that anyone working at the "character" level MUST
> > NOT ASSUME that such characters consist only of a single
> > Java "char" value. And that'd be true even if "char" were
> > to make an incompatible change, and acquire a few extra
> > bits at the left so that surrogates could in some cases be
> > eliminated.
> So could the paragraph above appear in the documentation somewhere? I
> think that would take of all my concerns.
Yes, I was thinking of doing that. After I imbibe the other thread
a bit more deeply, to make sure I pick up any other details. That
should make it into the SAX2 r2 ContentHandler docs, and maybe
also LexicalHandler.comment() if I get ambitious.