RE: There is a serious amount of character encoding conversionsoccurring

David Lee wrote:

To be able to track end to end the path of conversions

and validate that your application from authoring

through to storage through to search and retrieval is

completely correct is amazingly difficult. IMHO it’s a

skill far too few programmers have, or even recognize

that they do not have.

Fascinating!

You are writing about character encoding conversions as text moves from point to point to point.

Is there a parallel with markup? Are there markup conversions as XML moves from point to point to point?

Are there lessons learned in the character encoding community that could be applied to the XML community?

/Roger

From: David Lee [mailto:dlee@calldei.com]
Sent: Friday, December 28, 2012 9:36 AM
To: Costello, Roger L.; xml-dev@lists.xml.org
Subject: RE: There is a serious amount of character encoding conversions occurring inside our computers and on the Web

-----------

Some mighty smart fellows figured this character encoding stuff out long ago and now it is buried so deep in the fabric of our computers and the Web that we are completely oblivious to all the encoding conversions that are happening.

/Roger

----------

We are only completely oblivious when it works.

Which is really rare. I am amazed you had such success.

Try something interesting in your tests. Try a unicode charactor outside the 0xFFFF codepoint range.

Like this one: dec 110593 hex 1B001 HTML 𛀁

http://rishida.net/scripts/uniview/?codepoints=1B001

To be able to track end to end the path of conversions and validate that your application from authoring through to storage through to search and retrieval is completely correct is amazingly difficult. IMHO its a skill far too few programmers have, or even recognize that they do not have.

----------------------------------------

David A. Lee

dlee@calldei.com

http://www.xmlsh.org