OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] C1 characters in XML 1.0 and HTML 4

* Waters, Michael, Springer US wrote:
>Unicode character U+0092 is given as a control character in a private
>use area. I can't see our vendor or any workflow step (un)intentionally
>adding that character. About the only thing that makes sense to me is
>that at some point (probably the source document), Windows-1252 encoding
>was used, where decimal 146 is, I think, a right single quote. (Whether
>that's the appropriate character in this case is another matter.)

That is likely, yes. It might also come from some other set like Mac-
Roman, though I've not checked what the code represents there (and I
would not know if this wasn't a typo to begin with.)

>So, in all the XML processes, character U+0092 was passed through as
>legal, but in outputting to HTML it is illegal? I'm missing something
>here, surely.

XML 1.0 documents may use C1 control characters. Obviously in you case
you don't seem to actually mean to use C1 control characters. (Not that
anyone should care, but XML 1.1 allows the C1 control characters, but
only in the form of character references. And the SGML declaration for
HTML 4.01 does mark the C1 control character as unused.) So, there is
little consensus there about the status of C1 controls.
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS