[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Dangers of Copying Text into an XML Document
- From: David Carlisle <davidc@nag.co.uk>
- To: costello@mitre.org
- Date: Wed, 5 Sep 2007 17:05:34 +0100
> Example: Word uses Windows-1252 encoding.
word will presumably use whatever encoding its set to use on that
system. 1252 presumably isn't the default everywhere.
> Consequently, if the text was created
> in an editor that uses a different encoding than the XML document then
> the characters that result from pasting the text into the XML document
> may not be the same.
That's one thing that can happen, but perhaps more likely is that the
resulting string is not a valid utf8 sequence and so the resulting
document can not be parsed at all and will be rejected (with a "fatal
error")
> In UTF-8 the hex value x93 corresponds to a control character.
No, unless the following bytes also have the top bit set, and this is
the start of a mult-byte encoding of a character, this would be a fatal
error.
> Can you think of other problems that may result from copying text from
> one document and pasting it into an XML document?
The text might contain the string ]]> (although arguably you have
covered this by including > in the list of characters that "may need to
be escaped")
The text might contain non-xml characters that (in XML 1.0) can not even
be entered as numeric references (C0 controls, FFFE, FFFF or the values
corresponding to half a surrogate pair)
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]