xml-dev - Re: [xml-dev] RELAXNG Compact Syntax and character escapes

Re: [xml-dev] RELAXNG Compact Syntax and character escapes

[ Lists Home | Date Index | Thread Index ]

To: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>
Subject: Re: [xml-dev] RELAXNG Compact Syntax and character escapes
From: "Jeff Rafter" <lists@jeffrafter.com>
Date: Sun, 25 Apr 2004 22:15:41 -0700
Cc: <relaxng-user@relaxng.org>,<xml-dev@lists.xml.org>
References: <009e01c42a98$28652790$6403a8c0@ARIMATHEA> <00aa01c42aa2$42ddfed0$6403a8c0@ARIMATHEA> <p0601020cbcb17a0d4ea1@[192.168.254.88]>

> I'm certainly no Relax expert, but on the face of it is does *NOT*
> sound reasonable. In general XML and Unicode processing, one *MUST*
> handle characters with code points beyond U+FFFF. They are not
> optional.  This is true even if your programming language (Java
> perhaps?) has inadequate support for them.

What was I thinking? Don't code at 2:00 a.m., or at least don't email lists
when you can't figure stuff out at 2:00 a.m. I think this is a better
effort, all it took was some reading-- but of course comments are still
eagerly awaited.

// Set the character, but check for surrogates
if (escapeChar <= 0xFFFF) {
  // Output directly
  readBuffer[i] = (char)escapeChar;
} else if (escapeChar <= 0x10FFFF) {
  escapeChar -= 0x10000;
  // Greater than 16 bits (max 20), need a surrogate
  // Output High Surrogate (add top 10 bits to 0xD800)
  readBuffer[i++] = ((char) (0xD800 | (escapeChar >> 10)));
  // Output Low Surrogate (add bottom 10 bits to 0xDC00)
  readBuffer[i] = ((char) (0xDC00 | (escapeChar & 0x03FF)));
} else {
  // The value is too large
  Error("Character reference is too large for UTF-16",
((int)escapeChar).ToString("X"), null);
}

All the best,
Jeff Rafter

References:
- RELAXNG Compact Syntax and character escapes
  - From: "Jeff Rafter" <lists@jeffrafter.com>
- Re: [xml-dev] RELAXNG Compact Syntax and character escapes
  - From: "Jeff Rafter" <lists@jeffrafter.com>
- Re: [xml-dev] RELAXNG Compact Syntax and character escapes
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>

Prev by Date: Re: Software to generate all valid XML instances from a XML schema
Next by Date: Revved genx
Previous by thread: Re: [xml-dev] RELAXNG Compact Syntax and character escapes
Next by thread: Re: [relaxng-user] RELAXNG Compact Syntax and character escapes
Index(es):
- Date
- Thread