David Lee wrote: To be able to track end to end the path of conversions
and validate that your application from authoring
through to storage through to search and retrieval is
completely correct is amazingly difficult. IMHO it’s a
skill far too few programmers have, or even recognize
that they do not have. Fascinating! You are writing about character encoding conversions as text moves from point to point to point. Is there a parallel with markup? Are there markup conversions as XML moves from point to point to point? Are there lessons learned in the character encoding community that could be applied to the XML community? /Roger From: David Lee [mailto:dlee@calldei.com]
----------- Some mighty smart fellows figured this character encoding stuff out long ago and now it is buried so deep in the fabric of our computers and the Web that we are completely oblivious to all the encoding conversions
that are happening. /Roger ---------- We are only completely oblivious when it works. Which is really rare. I am amazed you had such success. Try something interesting in your tests. Try a unicode charactor outside the 0xFFFF codepoint range. Like this one: dec 110593 hex
1B001 HTML 𛀁
http://rishida.net/scripts/uniview/?codepoints=1B001 To be able to track end to end the path of conversions and validate that your application from authoring through to storage through to search and retrieval is completely correct is amazingly
difficult. IMHO its a skill far too few programmers have, or even recognize that they do not have. ---------------------------------------- David A. Lee |