[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] Dangers of Copying Text into an XML Document
- From: David Carlisle <davidc@nag.co.uk>
- To: costello@mitre.org
- Date: Thu, 6 Sep 2007 17:44:40 +0100
> . In UTF-8 encoding the hex value for the left curly quote is x201C,
No, that's the unicode value (in hex) but in utf8 the character is
represneted as a mult-byte sequence. (with the three bytes with hex code
points E2 80 9C).
The document should be careful to distinguish unicode from its
encodings as a sequence of bytes (since it is encoding errors that it is
describing, mainly)
> Copying a left curly quote from a Word document and pasting it into a
> UTF-8 XML document may result in the XML document receiving an illegal
> character.
that wording makes it sound as if you'd get the same sort of error as if
you'd included a control character in the document, that is, a valid
unicode character that is not allowed in XML. What you'd get in this
case is a byte stream that could not be decoded using utf8, so there
would be no characters to pass to the XML parser at all.
David
http://people.w3.org/rishida/scripts/uniview/conversion
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]