[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] RE: There is a serious amount of character encodingconversions occurring inside our computers and on the Web
- From: Chris Maloney <voldrani@gmail.com>
- To: David Lee <dlee@calldei.com>
- Date: Fri, 28 Dec 2012 14:45:28 -0500
Roger,
Here is a classic post from XML.com that is right in line with the
topic of character encodings that you have been posting about
recently, titled "XML on the web has failed":
http://www.xml.com/pub/a/2004/07/21/dive.html
It takes some work to really grok the problems the author is
describing, but it is well worth it, I think, and may make your head
spin (or hurt, depending).
I'd be very interested to hear if any of the XML / character encoding
gurus on this list have any comments or links to updates to this
article (which was written in 2004). I am not sure if the issues the
author describes have been remedied or not.
Chris
On Fri, Dec 28, 2012 at 12:17 PM, David Lee <dlee@calldei.com> wrote:
> ---------
>
> You are writing about character encoding conversions as text moves from
> point to point to point.
>
>
>
> Is there a parallel with markup? Are there markup conversions as XML moves
> from point to point to point?
>
>
>
> Are there lessons learned in the character encoding community that could be
> applied to the XML community?
>
>
>
> --------
>
>
>
>
>
> Markup is text and has the same problems (and solutions).
>
> If we could start over from scratch with what we know now there would be
> less problems.
>
>
>
>
>
> IMHO, my preferred solution is to stick to a single encoding everywhere (I
> vote for UTF8 ... as it handles all Unicode codepoints).
>
> The next step is to make sure *every single link in the chain* uses that
> encoding.
>
> This is amazingly difficult even in "modern" languages like Java where the
> default behavior of converting code points to strings is to use
>
> the *system default encoding* which is always an unknown. Even in pure
> java you have to track every single point that a byte array is converted to
> a String and visa versa,
>
> and explicitly set the encoding. (or guarantee the system encoding is
> correct).
>
> Then you have to manage all places the data enters and leaves the program
> and make sure it's in the right encoding.
>
> Then you have to make sure all places that *store* the data (like a
> database) don't muck with it.
>
> XML Itself cannot solve this problem alone as an XML document is only the
> payload ... However the XML Tools tend to be a bit more mature about
> dealing with this.
>
> But not always.
>
>
>
> Maybe in another 30 years more we will have migrated all our tools to be
> consistant about encodings.
>
>
>
>
>
> ----------------------------------------
>
> David A. Lee
>
> dlee@calldei.com
>
> http://www.xmlsh.org
>
>
>
>
>
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]