Re: [xml-dev] C1 characters in XML 1.0 and HTML 4

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: "Waters, Michael, Springer US" <Mike.Waters@springer.com>
Date: Sun, 13 Mar 2011 00:22:50 +0100

* Waters, Michael, Springer US wrote:
>Unicode character U+0092 is given as a control character in a private
>use area. I can't see our vendor or any workflow step (un)intentionally
>adding that character. About the only thing that makes sense to me is
>that at some point (probably the source document), Windows-1252 encoding
>was used, where decimal 146 is, I think, a right single quote. (Whether
>that's the appropriate character in this case is another matter.)

That is likely, yes. It might also come from some other set like Mac-
Roman, though I've not checked what the code represents there (and I
would not know if this wasn't a typo to begin with.)

>So, in all the XML processes, character U+0092 was passed through as
>legal, but in outputting to HTML it is illegal? I'm missing something
>here, surely.

XML 1.0 documents may use C1 control characters. Obviously in you case
you don't seem to actually mean to use C1 control characters. (Not that
anyone should care, but XML 1.1 allows the C1 control characters, but
only in the form of character references. And the SGML declaration for
HTML 4.01 does mark the C1 control character as unused.) So, there is
little consensus there about the status of C1 controls.
-- 
Bj�rn H�hrmann � mailto:bjoern@hoehrmann.de � http://bjoern.hoehrmann.de
Am Badedeich 7 � Telefon: +49(0)160/4415681 � http://www.bjoernsworld.de
25899 Dageb�ll � PGP Pub. KeyID: 0xA4357E78 � http://www.websitedev.de/

Follow-Ups:
- RE: [xml-dev] C1 characters in XML 1.0 and HTML 4
  - From: "Waters, Michael, Springer US" <Mike.Waters@springer.com>

References:
- C1 characters in XML 1.0 and HTML 4
  - From: "Waters, Michael, Springer US" <Mike.Waters@springer.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]