OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] An XML document is not well-formed if encoding="..."does not match the actual encoding of the characters in the document, right?

Roger wrote:

> I would advocate using UTF-8 exclusively

That's what I do with my own files, and what I advocate whenever I
have any input to design decisions, but as Liam and others have said,
it's not practical to expect everyone to adopt this convention.

What I really want to know is, when can we start freely using BOMs in
UTF-8?  I really like this idea, because it is a simple, easy way for
a text file to "declare" that it is in UTF-8, and eliminate the
ambiguity when the text files are passed around.  Unfortunately, a lot
of software, especially on Linux, still chokes on these.

On a slightly different topic (UTF-16), this discussion reminded of
something else I read a while back, a technical note the Unicode
Consortium advocating for the use of UTF-16 for internal processing
(as opposed to file interchange):
http://unicode.org/notes/tn12/tn12-1.html.  On the other hand, I just
found from a Google search this recent thread on StackExchange, where
several people argue that UTF-16 should be considered harmful:
 I guess the debate will rage on, but interoperability, on the whole,
does seem to be getting better.


On Sat, Dec 29, 2012 at 2:36 PM, Costello, Roger L. <costello@mitre.org> wrote:
> Hi Folks,
> I spoke with George Cristian Bina from oXygen XML and he gave me the scoop on how things work inside oXygen.
> George told me to do this:
> 1. Create an iso-8859-1 encoded XML file.
> 2. Using a hex editor, change encoding="iso-8859-1" to encoding="utf-8"
> 3. Drag and drop the file into oXygen.
> 4. oXygen will generate an encoding exception:
>     Cannot open the specified file. Got a character
>     encoding exception [snip]
> Next, here is something George told me. It is mind-blowing:
>     If you have an iso-8859-1 encoded XML file loaded into oXygen
>     and change encoding="iso-8859-1" to encoding="utf-8" then
>     oXygen will automatically change the encoding of every character
>     in the document to UTF-8.
> Wow!
> That is so fantastic, I jumped out of my chair when I read it.
> I just received this additional information from George:
>     Please note that the encoding is important only when the file is loaded
>     and saved. When the file is loaded the bytes are converted to characters
>     and then the application works only with characters. When the file is
>     saved then those characters need to be converted to bytes and the
>     encoding used will be determined from the XML header with a default to
>     UTF-8 if no encoding can be detected.
> /Roger
> _______________________________________________________________________
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS