OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] UTF-8 use with XML

[ Lists Home | Date Index | Thread Index ]

Thanks Tim,

Here is the hex for <BirthCity>K?/BirthCity>: 

3C 42 69 72 74 68 43 69  74 79 3E 4B EF BF BD 2F 42 69 72 74 68 43 69 74 79

EF BF BD are the questionable characters which replaced 3C.


-----Original Message-----
From: Tim Bray [mailto:tbray@textuality.com]
Sent: Friday, June 13, 2003 11:16 AM
To: Long, Craig Z
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] UTF-8 use with XML

Long, Craig Z wrote:
> Given the following element using a utf character (created by a user's
> system): <BirthCity>Trenton?/BirthCity> I've been told my system should be
> programmed to accept this.  I can't find any documentation which supports
> yes or no to this premise.  Currently we reject this as not well-formed
> Please offer expertise concerning this issue.

If it really contains a UTF8 character, no programming should be 
required, all conforming XMl software is required to accept UTF data. 
Things that could be wrong:

- there's an encoding declaration at the front of the file saying it's
   something other than UTF-8
- you think it's UTF-8 but it isn't.

If there's no encoding declaration, then the second is almost certainly 
true.  If you provide a hex dump of the affected region there are 
several people here who could look at it and tell you whether it's 
really UTF-8

Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS