RE: [xml-dev] C1 characters in XML 1.0 and HTML 4

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Waters, Michael, Springer US" <Mike.Waters@springer.com>
To: "Michael Kay" <mike@saxonica.com>,<xml-dev@lists.xml.org>
Date: Sat, 12 Mar 2011 20:07:59 -0500

>Occasionally the internationalization working group in W3C decides to flex its muscles,
>and one instance of this was there insistence that XSLT should not generate HTML
>that contains characters which HTML defines to be illegal.

Seems very reasonable to me. Until Bjoern reminded me, I forgot about the SGML declaration for HTML 4.

>It's probably a mistake that XML allowed these C1 characters, because they are
>nearly always miscoded CP1252 characters. XML 1.1 tried to fix this problem
>but we all know what happened to that.

Yes, indeed. We've tried to avoid the complications of handling XML 1.1 in our tool chain.

>In the meantime, the result is that you feed a bad character
>nto the start of your processing pipeline and you discover
>the problem at the final stage when HTML emerges.

I was just a bit surprised that the error was caught so far down the line.
�
>The reasoning of course is that the end user shouldn't pay the price
>for the content provider's carelessness.

>This is very different from the culture in W3C which tries
>to improve data quality by insisting that software should
>reject bad data.

I'm usually on the delivery side of things, so I'm always working to understand the content and prevent bad data from getting out there in the first place.

Many thanks, Dr. Kay.

Regards,
Mike

References:
- C1 characters in XML 1.0 and HTML 4
  - From: "Waters, Michael, Springer US" <Mike.Waters@springer.com>
- Re: [xml-dev] C1 characters in XML 1.0 and HTML 4
  - From: Michael Kay <mike@saxonica.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]