[
Lists Home |
Date Index |
Thread Index
]
Please see Tim Bray's excellent treatise on this topic [1].
Kind Regards,
Joe Chiusano
Booz | Allen | Hamilton
Strategy and Technology Consultants to the World
[1] http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
Andy Greener wrote:
>
> I'd appreciate some advice on the following issues...
>
> Being from the UK, we have a requirement to convey the UK pound-sterling
> character in XML documents (and validate those documents of course).
> The Unicode decimal value of pound sterling is 163 (0xA3), but of course
> the UTF-8 encoding is 0xC2A3.
>
> I'm ok with the fact that a UTF-8 encoded instance doc can contain the
> above two byte values directly (i.e. 0xC2 and 0xA3), but I'm getting
> conflicting opinion as to whether replacing those two bytes with the
> character entity £ is equivalent or not - I think not, so long as
> the document is UTF-8 encoded, though it would be correct to do this
> if the encoding were "ISO-8859-1", as would inserting the actual pound
> character (ie the 8 bit value equivalent to 0xA3). However, I'm happy to
> be corrected.
>
> I guess the fundamental question is: how are character entities
> interpreted in relation to the document encoding (i.e. what's the
> order of evaluation)? If that's not the fundamental question then
> I'm missing something :-))
>
> A supplementary question: if I want to validate text containing pound
> sterling characters, and my Schemas are UTF-8 encoded, what do I put in
> the pattern facet: £ or the two character UTF-8 encoding? And what
> will your average regular expression evaluator make of the latter?
>
> Thanks in advance for any help
> --
>
> Andy Greener Mob: +44 7836 331933
> GID Ltd, Reading, UK Tel: +44 118 956 1248
> andy@gid.co.uk Fax: +44 118 958 9005
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
|