[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] An XML document is not well-formed if encoding="..."does not match the actual encoding of the characters in the document, right?
- From: Michael Sokolov <sokolov@ifactory.com>
- To: "Costello, Roger L." <costello@mitre.org>
- Date: Sat, 29 Dec 2012 11:14:51 -0500
On 12/29/2012 9:38 AM, Costello, Roger L. wrote:
>
> -----------
> Question
> -----------
> This outstanding discussion has awakened me to the problems with the multiplicity of character encodings and the huge number of character encoding conversions taking place behind-the-scene.
>
> Is the solution to the problems to simply eliminate the need for conversions by mandating that every application, every IDE, every text editor, and every system worldwide adopt one character encoding, UTF-8? It that a realistic solution? If so, what is the timeframe in which it could be achieved?
>
> /Roger
>
My perspective is a bit different: I spent some early years working as
an "internationalization" consultant. Think of the job as poring over
programs written in C, which operated on text as bytes, hunting for
string manipulations that would incorrectly interpret the second byte of
an S-JIS character as an ASCII "\". And I would say that the situation
is much improved since 1992. The problems are no less difficult, but
they occur much less frequently: basically because of the wide adoption
of Unicode (in particular UTF-8). Encoding issues arise, and when they
do they are frequently mysterious and difficult to diagnose, but there
aren't enough of them to support an entire mini-industry of consultants,
as there were. And I don't mourn the passing of that job.
I can't see how any sort top-down universal mandate could ever succeed
completely, but certainly within whatever one's purview may be, yes, I
would advocate using UTF-8 exclusively. And I think we are moving in
that direction already, anyway.
-Mike
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]