[
Lists Home |
Date Index |
Thread Index
]
> I'm certainly no Relax expert, but on the face of it is does *NOT*
> sound reasonable. In general XML and Unicode processing, one *MUST*
> handle characters with code points beyond U+FFFF. They are not
> optional. This is true even if your programming language (Java
> perhaps?) has inadequate support for them.
What was I thinking? Don't code at 2:00 a.m., or at least don't email lists
when you can't figure stuff out at 2:00 a.m. I think this is a better
effort, all it took was some reading-- but of course comments are still
eagerly awaited.
// Set the character, but check for surrogates
if (escapeChar <= 0xFFFF) {
// Output directly
readBuffer[i] = (char)escapeChar;
} else if (escapeChar <= 0x10FFFF) {
escapeChar -= 0x10000;
// Greater than 16 bits (max 20), need a surrogate
// Output High Surrogate (add top 10 bits to 0xD800)
readBuffer[i++] = ((char) (0xD800 | (escapeChar >> 10)));
// Output Low Surrogate (add bottom 10 bits to 0xDC00)
readBuffer[i] = ((char) (0xDC00 | (escapeChar & 0x03FF)));
} else {
// The value is too large
Error("Character reference is too large for UTF-16",
((int)escapeChar).ToString("X"), null);
}
All the best,
Jeff Rafter
|