OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: control characters

[ Lists Home | Date Index | Thread Index ]
  • From: "Wayne Steele" <xmlmaster@hotmail.com>
  • To: xml-dev@xml.org
  • Date: Tue, 20 Jun 2000 19:38:57 PDT

>The workaround I usually suggest is to represent control characters
>with (references to) characters from the Unicode private use range.
>This makes the necessary transformation a simple character
>substitution (which can even be just a subtraction - no need for a
>  -- Richard

Actually, as someone has already pointed out, 0x007F - 0x009F are fair game 
for XML documents, and Unicode has these defined as control character 

Mapping 0x0000 - 0x001F to the private use area sounds like the "correct" 
unicode thing to do, But for US-ASCII/UTF-8 documents I would map to 0x0080 
- 0x009F instead.
This way you preserve the deprecated anglo centric english-only bigoted 
assumption of 1 character == 1 byte.

The only downside is that someone might actually have data in this range. I 
think this is about as likely as someone having data in the private use 

XSLT will not _ALWAYS_ give you a perfect output format.
XML --> XSLT --> simple_text_filter seems like a win to me.

-Wayne Steele

Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS