XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Dangers of Copying Text into an XML Document

Hi Folks,
 
I am compiling a list of well-formedness problems that may arise from
copying text from one document and pasting it into an XML document. 
 
For example, consider this XML document:
 
<?xml version="1.0" encoding="UTF-8"?>
<Document>
      <Para id="...">...</Para>
</Document>
 
Suppose that text is copied from a document and pasted into the XML
document, either as the content of the <Para> element or as the value
of the id attribute.
 
Here is my current list of problems:
 
1. The text may contain these reserved characters: {<, >, ', ", &}.
These characters may introduce syntax errors into the XML document and
may need to be escaped.
 
2. The editor that was used to create the text may use a different
encoding than the XML document's encoding. A binary string that
represents a character in one encoding may represent a different
character in another encoding.  Consequently, if the text was created
in an editor that uses a different encoding than the XML document then
the characters that result from pasting the text into the XML document
may not be the same.  Example: Word uses Windows-1252 encoding. The hex
value for the left curly (a.k.a. smart) quote is x93. In UTF-8 encoding
the hex value for the left curly quote is x201C. In UTF-8 the hex value
x93 corresponds to a control character.  Copying a left curly quote
from a Word document and pasting it into a UTF-8 XML document may
result in the XML document receiving a control character rather than a
left curly quote. 
 
Can you think of other problems that may result from copying text from
one document and pasting it into an XML document?
 
/Roger


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS