OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Dangers of Copying Text into an XML Document


> Example: Word uses Windows-1252 encoding.
word will presumably use whatever encoding its set to use on that
system. 1252 presumably isn't the default everywhere.

> Consequently, if the text was created
> in an editor that uses a different encoding than the XML document then
> the characters that result from pasting the text into the XML document
> may not be the same. 

That's one thing that can happen, but perhaps more likely is that the
resulting string is not a valid utf8 sequence and so the resulting
document can not be parsed at all and will be rejected (with a "fatal

> In UTF-8 the hex value x93 corresponds to a control character. 
No, unless the following bytes also have the top bit set, and this is
the start of a mult-byte encoding of a character, this would be a fatal

> Can you think of other problems that may result from copying text from
> one document and pasting it into an XML document?

The text might contain the string ]]> (although arguably you have
covered this by including > in the list of characters that "may need to
be escaped")

The text might contain non-xml characters that (in XML 1.0) can not even
be entered as numeric references (C0 controls, FFFE, FFFF or the values
corresponding to half a surrogate pair)


The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS