OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] relax UTF-8 default? was: [xml-dev] Towards XML 2.0

On 10 December 2010 09:28, David Carlisle <davidc@nag.co.uk> wrote:
> On 10/12/2010 08:56, Stephen Green wrote:
>> Does newXML being treatable as a string mean the *UTF-8 default*
>> requirement
>> is better relaxed in some way? I mean, a developer writing a string
>> doesn't want
>> to have to ensure it is all written in UTF-8 do they?
> why would any person ever have to know what the utf8 encoding is? If you
> want an "a" then you can enter an a without knowing what the latin1 or ascii
> or utf8 encodings of an a are. They happen to all be the same in that case.
> If you pick another letter such as pound sign, or e acute they happen to be
> different, but since typically a human doesn't know any of the numbers it
> doesn't make any difference, it's just a matter of what your text editor
> does when you hit save.

Yep - the "UTF-8/16 only" suggestion is to solve the problem of the
potential mismatch between the encoding in the prolog and the actual
encoding.. add to that the content-type when http is involved and you
have 3 areas to look at to determine the encoding...

This manifests itself as the common problem of "funny characters" in
the output, where UTF-8 has been parsed as windows 1252 or latin 1.
Or vice-versa where you get the "invalid byte sequence" error message.

One common cause of this is simply someone editing the xml file in a
text editor such as notepad... someone updates a value in a config
file and bang, the xml won't parse any more.

Making it UTF-8/16 only fixes the widespread "funny characters"
problem by always parsing in UTF-8/16, and on the flip side can
replace the obscure "Invalid byte sequence.." error message with "This
document is not UTF-8/16, please fix this by blah blah blah" or some
other more helpful message.

It also fixes the 3-way xml-over-http whats-the-encoding fun...

It also makes removing the prolog easier, and should allow a better
error message when parsing an empty file etc.

Andrew Welch

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS