OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Character entities and document encoding

[ Lists Home | Date Index | Thread Index ]

Please see Tim Bray's excellent treatise on this topic [1].

Kind Regards,
Joe Chiusano
Booz | Allen | Hamilton
Strategy and Technology Consultants to the World

[1] http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF

Andy Greener wrote:
> I'd appreciate some advice on the following issues...
> Being from the UK, we have a requirement to convey the UK pound-sterling
> character in XML documents (and validate those documents of course).
> The Unicode decimal value of pound sterling is 163 (0xA3), but of course
> the UTF-8 encoding is 0xC2A3.
> I'm ok with the fact that a UTF-8 encoded instance doc can contain the
> above two byte values directly (i.e. 0xC2 and 0xA3), but I'm getting
> conflicting opinion as to whether replacing those two bytes with the
> character entity £ is equivalent or not - I think not, so long as
> the document is UTF-8 encoded, though it would be correct to do this
> if the encoding were "ISO-8859-1", as would inserting the actual pound
> character (ie the 8 bit value equivalent to 0xA3). However, I'm happy to
> be corrected.
> I guess the fundamental question is: how are character entities
> interpreted in relation to the document encoding (i.e. what's the
> order of evaluation)? If that's not the fundamental question then
> I'm missing something :-))
> A supplementary question: if I want to validate text containing pound
> sterling characters, and my Schemas are UTF-8 encoded, what do I put in
> the pattern facet: £ or the two character UTF-8 encoding? And what
> will your average regular expression evaluator make of the latter?
> Thanks in advance for any help
> --
> Andy Greener                         Mob: +44 7836 331933
> GID Ltd, Reading, UK                 Tel: +44 118 956 1248
> andy@gid.co.uk                       Fax: +44 118 958 9005
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS