[
Lists Home |
Date Index |
Thread Index
]
From: "John Cowan" <cowan@mercury.ccil.org>
> Rick Jelliffe scripsit:
>
> > Even if you only use ISO 8859-1, it is still important. The Euro=0x80
> > mistake will be increasingly common, and we need to make sure that
> > XML processors continue to catch this error.
>
> But they don't!
>
> Characters U+0080 through U+009F are legal XML content.
>
> You are talking about a "defense" that doesn't even exist.
No. 0x0085 is not AFAIK a character in ISO 8859-1 (it is one of the design principles
of 8859-1 that it will not fail on systems that mask the 8th bit and look for control
characters). So a document labelled as ISO 8859-1 but with an 0x85 false Euro
should fail on import. The 85 character not existing in 8859-1, it never gets as far
as Unicode.
MSXML 4 gets this right, and gives an error at those times. I have had a support
request on this with our validator, so I had to look into it.
The defense does exist.
Cheers
Rick Jelliffe
|