xml-dev - Re: control characters

Re: control characters

[ Lists Home | Date Index | Thread Index ]

From: "Wayne Steele" <xmlmaster@hotmail.com>
To: xml-dev@xml.org
Date: Tue, 20 Jun 2000 19:38:57 PDT


>
>The workaround I usually suggest is to represent control characters
>with (references to) characters from the Unicode private use range.
>This makes the necessary transformation a simple character
>substitution (which can even be just a subtraction - no need for a
>table).
>
>  -- Richard

Actually, as someone has already pointed out, 0x007F - 0x009F are fair game 
for XML documents, and Unicode has these defined as control character 
aliases.

Mapping 0x0000 - 0x001F to the private use area sounds like the "correct" 
unicode thing to do, But for US-ASCII/UTF-8 documents I would map to 0x0080 
- 0x009F instead.
This way you preserve the deprecated anglo centric english-only bigoted 
assumption of 1 character == 1 byte.

The only downside is that someone might actually have data in this range. I 
think this is about as likely as someone having data in the private use 
area.

XSLT will not _ALWAYS_ give you a perfect output format.
XML --> XSLT --> simple_text_filter seems like a win to me.

-Wayne Steele

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com


***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************

Prev by Date: RE: MSXML and XML-DR
Next by Date: Re: xslv/xcss
Previous by thread: RE: control characters
Next by thread: RE: control characters
Index(es):
- Date
- Thread