XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
C1 characters in XML 1.0 and HTML 4

I found some related material in the list archives, but I wanted to check my understanding of the use of C1 characters in XML 1.0 and in HTML 4.

 

We have a UTF-8 encoded XML document that has gone through a number of conversions and import/export routines into/out of a CMS. At all times, the XML document was valid against the DTD, and in Oxygen everything seems fine. No errors were reported in the workflow until a late stage, where in rendering to HTML Saxon reported:

 

   net.sf.saxon.trans.DynamicError: Illegal HTML character: decimal 146

 

I traced the error to an article title, where there was an embedded hex character reference:

 

   Language rights versus speakers’ rights

 

Unicode character U+0092 is given as a control character in a private use area. I can’t see our vendor or any workflow step (un)intentionally adding that character. About the only thing that makes sense to me is that at some point (probably the source document), Windows-1252 encoding was used, where decimal 146 is, I think, a right single quote. (Whether that’s the appropriate character in this case is another matter.)

 

So, in all the XML processes, character U+0092 was passed through as legal, but in outputting to HTML it is illegal? I’m missing something here, surely.

 

Curiously, in my readings, HTML 5 seems to be special-casing Windows-1252 encoding, along with UTF-8, in that it must be supported:

 

http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#character-encodings-0

 

Best regards,

Mike Waters

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS