[
Lists Home |
Date Index |
Thread Index
]
At 04:36 PM 8/23/2002 -0600, Matt Gushee wrote:
>I would bet it's this. Just this past week I have been debugging a
>broken application that is supposed to generate XML from Word documents.
>The main problem I found was that the Word documents are full of
>characters like 0x07, 0x2012-0x2019, and the like. The latter range
>consists of common punctuation symbols like dashes and left and right
>quotes (AKA 'smart quotes'). They appear to be using Code Page 1252
>mapped directly into Unicode.
I just ran into this myself, with a styled apostrophe character -- which
was only reported as a problem by XML Spy 4.4 upon opening the 1.2MB XML
file (character was: Â (0xC2), ' (0x92)).
All three validators I have (Xerces standalone, XMetal 3.0, and XML Spy
4.4) reported the file valid, but it was failing upon import into a content
management system (with the less than helpful error of "no root element
present", when there clearly was).
A tool that would quickly locate these kinds of things would be enormously
helpful (I'd certainly buy a copy if it were commercial/shareware).
Ann
-----
Ann Navarro, WebGeek, Inc.
http://www.webgeek.com
say what? http://www.snorf.net/blog
|