OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8")put Inside the XML Document?

Hi Roger,

Thanks for distilling this kind of information.

Costello, Roger L. wrote:
> I have incorporated your comments.  Please let me know if I am missing
> anything, or have incorrectly interpreted your comments:
> http://www.xfront.com/specifying-encoding/
> I am particularly interested in hearing if you agree with the
> recommendations that I list.

When discussing encoding detection, you write:

     If the external information is unreliable or unavailable then a
     parser examines the first 4 bytes of the document. XML and HTML
     documents optionally have a Byte Order Mark (BOM) in the first 4
     bytes. The BOM may indicate the encoding. So if the document has a
     BOM then the parser may be able to determine the document's

This is not technically correct, because a BOM is not required for the 
auto-detection algorithm to work. See [1], which describes cases both 
with and without a BOM, for encodings including UCS-4 and UTF-16 
(big-endian and little-endian), EBCDIC, as well as UTF-8, ISO 646, 
ASCII, and other encodings that have the ASCII characters in their 
normal positions.

See also David Carlisle's comments, which cover some of the same issues, 
and arrived while I was composing this message.

It would also be useful for your document to link to relevant specs, for 
example [1].

Jim Ancona

[1] XML 1.0 Reccomendation, Appendix F: Autodetection of Character 
Encodings, http://www.w3.org/TR/REC-xml/#sec-guessing

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS