OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Detection of non-Unicode characters

[ Lists Home | Date Index | Thread Index ]

From: "Ann Navarro" <ann@webgeek.com>

> I just ran into this myself, with a styled apostrophe character -- which 
> was only reported as a problem by XML Spy 4.4 upon opening the 1.2MB XML 
> file (character was: Â (0xC2), ' (0x92)).


I expect we will see more of this problem, unless the C1 controls (U+0080-U+009F)
are banned from direct use in XML. The trouble is that transcoders do not fail when
they find strange characters. Nothing stops your XML from being polluted, because
after the data is in corrupted, it may look like good data. For more on this issue,
see http://www.topologi.com/public/XML_Naming_Rules.html  

...
> A tool that would quickly locate these kinds of things would be enormously 
> helpful (I'd certainly buy a copy if it were commercial/shareware).

You may care to look at my company's new editor for XML and SGML:
the Topologi Collaborative Markup Editor. See
 http://www.topologi.com/

We'll be posting the real announcement in a day or two; you can download it
for evaluation now.

When you open a file, an "Incoming Text Conditioning" box comes up. In the
"Whitespace" tab you can set it to:
  * detect control characters or characters above a certain character
  * give a warning or replace the character with a PI containing the code point,
to figure out what is going wrong and where it is.

Also, it displays the Unicode code for the current caret position, so you can
see what is going on even when the font doesn't have a glyph for a character.
It will give warnings for many kinds of encoding errors, and sorts its available
encodings in three ways (by platform, by language, and by IANA name)
for easier selection. It performs Unicode normalization on the way in and the 
way out, and during cut-and-paste. 

Cheers
Rick Jelliffe





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS