--------- You are writing about character encoding conversions as text moves from point to point to point. Is there a parallel with markup? Are there markup conversions as XML moves from point to point to point? Are there lessons learned in the character encoding community that could be applied to the XML community? -------- Markup is text and has the same problems (and solutions).
If we could start over from scratch with what we know now there would be less problems. IMHO, my preferred solution is to stick to a single encoding everywhere (I vote for UTF8 ... as it handles all Unicode codepoints). The next step is to make sure *every single link in the chain* uses that encoding. This is amazingly difficult even in "modern" languages like Java where the default behavior of converting code points to strings is to use
the *system default encoding* which is always an unknown. Even in pure java you have to track every single point that a byte array is converted to a String and visa versa, and explicitly set the encoding. (or guarantee the system encoding is correct).
Then you have to manage all places the data enters and leaves the program and make sure it's in the right encoding. Then you have to make sure all places that *store* the data (like a database) don't muck with it. XML Itself cannot solve this problem alone as an XML document is only the payload ... However the XML Tools tend to be a bit more mature about dealing with this. But not always. Maybe in another 30 years more we will have migrated all our tools to be consistant about encodings. |