OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
C1 characters in XML 1.0 and HTML 4

I found some related material in the list archives, but I wanted to check my understanding of the use of C1 characters in XML 1.0 and in HTML 4.


We have a UTF-8 encoded XML document that has gone through a number of conversions and import/export routines into/out of a CMS. At all times, the XML document was valid against the DTD, and in Oxygen everything seems fine. No errors were reported in the workflow until a late stage, where in rendering to HTML Saxon reported:


   net.sf.saxon.trans.DynamicError: Illegal HTML character: decimal 146


I traced the error to an article title, where there was an embedded hex character reference:


   Language rights versus speakers’ rights


Unicode character U+0092 is given as a control character in a private use area. I can’t see our vendor or any workflow step (un)intentionally adding that character. About the only thing that makes sense to me is that at some point (probably the source document), Windows-1252 encoding was used, where decimal 146 is, I think, a right single quote. (Whether that’s the appropriate character in this case is another matter.)


So, in all the XML processes, character U+0092 was passed through as legal, but in outputting to HTML it is illegal? I’m missing something here, surely.


Curiously, in my readings, HTML 5 seems to be special-casing Windows-1252 encoding, along with UTF-8, in that it must be supported:




Best regards,

Mike Waters


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS