OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] xml over http - RFC 3023

> > The out-of-band signalling of character encoding is a fundamentally 
> > broken idea, because there are no mechanisms for programs which 
> > generate data to memoize the character encoding used that can then 
> > feed the rest of the food-chain.
> How about the BOM - that's one way isn't it?  I wonder if a 
> similar ignorable byte sequence could be added to the start 
> of all byte sequences to indicate the encoding of what's coming.

None of the ideas in use here is fundamentally broken, they are all doing
their best to deliver results in an imperfect world.

There are two things that would work, in a more perfect world:

(a) an XML document carries the knowledge of its own encoding (preferably
without the bizarre feature that you need to know what the encoding is
before you can decode the encoding name!), and the carrier doesn't meddle
with it

(b) an XML document is text; the carrier is responsible for knowing the
encoding of text and is allowed to change it; but it needs to know correctly
what the original encoding of the text is, and needs to inform the recipient
reliably what the final encoding of the text is.

Both of these break primarily because they make invalid assumptions about
the rest of the system. An XML document doesn't know its own encoding
because it's frequently created using tools such as text editors that don't
know they are dealing with XML and don't regard it as their responsibility
to set the encoding correctly. Equally, a carrier often doesn't know the
encoding of its payload message because APIs don't require the information
to be provided correctly.

So don't try to blame any one spec in this area. They are all doing their
best. But so long as we have systems that aren't type-safe end-to-end (for
example, operating system filestores without any real metadata) we're going
to get character encoding glitches somewhere along the line.

Michael Kay

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS