OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: arbitrary characters in XML document?

[ Lists Home | Date Index | Thread Index ]
  • From: David Brownell <david-b@pacbell.net>
  • To: Cliff Draper <cliffwd@forte.com>
  • Date: Thu, 02 Sep 1999 13:42:10 -0700

Cliff Draper wrote:
> 
> Hi,  I have a question about dealing with multiple character sets.
> 
> I have an application where I want to store data in XML and retrieve
> it later.  Now a good chunk of the data I want to store is coming
> straight from the user and I have little control over exactly which
> character set the user is using.  One of my users apparently tried
> using 0x98 + 0x03 as an accented 'e'; I have no idea which character
> set he used (and I don't care),

You should.  Arbitrary binary garbage isn't necessarily going to
be legal -- as happened in this case -- and even if it chances to
be legal, it's likely to come out as something that wasn't intended.

Coming out as an error diagnostic is a useful outcome ... hidden
mangling of data is as likely, and causes severe problems later on.
A diagnostic lets you fix the problems early, before they get bad.


>	 but I still want to be able to store
> it and parse it later.  When I parse it with expat with an
> encoding="UTF-8", it complains that it's not well-formed.

Probably because it isn't.


> Any ideas?

Don't permit aritrary binary data into your text.  Ensure you know
what character encoding was used, and make sure that you either 
transform that encoding to the one you're using, or switch to using
that encoding.

- Dave



> thanks,
> -Cliff Draper
>  cliffwd@forte.com
> 
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS