OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] RELAXNG Compact Syntax and character escapes

[ Lists Home | Date Index | Thread Index ]

> I'm certainly no Relax expert, but on the face of it is does *NOT*
> sound reasonable. In general XML and Unicode processing, one *MUST*
> handle characters with code points beyond U+FFFF. They are not
> optional.  This is true even if your programming language (Java
> perhaps?) has inadequate support for them.

What was I thinking? Don't code at 2:00 a.m., or at least don't email lists
when you can't figure stuff out at 2:00 a.m. I think this is a better
effort, all it took was some reading-- but of course comments are still
eagerly awaited.

// Set the character, but check for surrogates
if (escapeChar <= 0xFFFF) {
  // Output directly
  readBuffer[i] = (char)escapeChar;
} else if (escapeChar <= 0x10FFFF) {
  escapeChar -= 0x10000;
  // Greater than 16 bits (max 20), need a surrogate
  // Output High Surrogate (add top 10 bits to 0xD800)
  readBuffer[i++] = ((char) (0xD800 | (escapeChar >> 10)));
  // Output Low Surrogate (add bottom 10 bits to 0xDC00)
  readBuffer[i] = ((char) (0xDC00 | (escapeChar & 0x03FF)));
} else {
  // The value is too large
  Error("Character reference is too large for UTF-16",
((int)escapeChar).ToString("X"), null);

All the best,
Jeff Rafter


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS