XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] RE: There is a serious amount of character encodingconversions occurring inside our computers and on the Web

Roger,

Here is a classic post from XML.com that is right in line with the
topic of character encodings that you have been posting about
recently, titled "XML on the web has failed":
http://www.xml.com/pub/a/2004/07/21/dive.html

It takes some work to really grok the problems the author is
describing, but it is well worth it, I think, and may make your head
spin (or hurt, depending).

I'd be very interested to hear if any of the XML / character encoding
gurus on this list have any comments or links to updates to this
article (which was written in 2004).  I am not sure if the issues the
author describes have been remedied or not.

Chris


On Fri, Dec 28, 2012 at 12:17 PM, David Lee <dlee@calldei.com> wrote:
> ---------
>
> You are writing about character encoding conversions as text moves from
> point to point to point.
>
>
>
> Is there a parallel with markup? Are there markup conversions as XML moves
> from point to point to point?
>
>
>
> Are there lessons learned in the character encoding community that could be
> applied to the XML community?
>
>
>
> --------
>
>
>
>
>
> Markup is text and has the same problems (and solutions).
>
> If we could start over from scratch with what we know now there would be
> less problems.
>
>
>
>
>
> IMHO, my preferred solution is to stick to a single encoding everywhere (I
> vote for UTF8 ... as it handles all Unicode codepoints).
>
> The next step is to make sure *every single link in the chain* uses that
> encoding.
>
> This is amazingly difficult even in "modern" languages like Java where the
> default behavior of converting code points to strings is to use
>
> the *system default encoding* which is always an unknown.   Even in pure
> java you have to track every single point that a byte array is converted to
> a String and visa versa,
>
> and explicitly set the encoding.   (or guarantee the system encoding is
> correct).
>
> Then you have to manage all places the data enters and leaves the program
> and make sure it's in the right encoding.
>
> Then  you have to make sure all places that *store* the data (like a
> database) don't muck with it.
>
> XML Itself cannot solve this problem alone as an XML document is  only the
> payload ...  However the XML Tools tend to be a bit more mature about
> dealing with this.
>
> But not always.
>
>
>
> Maybe in another 30  years more we will have migrated all our tools to be
> consistant about encodings.
>
>
>
>
>
> ----------------------------------------
>
> David A. Lee
>
> dlee@calldei.com
>
> http://www.xmlsh.org
>
>
>
>
>
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS