XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8") put Inside the XML Document?

----- Original Message From: "David Carlisle" <davidc@nag.co.uk>

>> On that basis, it will assume that it is UTF-8.
>
> It might, or it might assume it's ascii or windows code page 1252 or it
> might choose not to view it as an encoded character stream at all and
> just read a sequence of bytes. It's just an implementation detail.

Well the spec says that the default encoding when the first character looks 
like ASCII '<' is UTF-8.  True, it could assume the document was ASCII or 
Windows 1252, but having got to the end of the xml decl (assuming it's 
there) and having seen nothing to the contrary, it would have to switch to 
UTF-8.  This does seem an odd way to go about things though.

>> It will then proceed
>> to read the rest of the XML decl and on interpreting the encoding 
>> attribute
>> will revise it's guess to be iso-8859-2.
>
> The _effect_ has to be same as if the correct encoding was specified
> externally and the whole file, including the xml declaration, is read
> with a single encoding, which is the encoding specified in the xml
> declaration. In practice a real system won't back up and re-read from
> the beginning of the file once it has parsed the declaration, but it's
> simplest to imagine that it does.

I was coming from the direction of how that effect is typically/can be 
achieved.

> The point I was trying to make was that the curent document makes it
> sound as if it's legal to have an encoding declaration encoded in ascii
> which specifies a non-ascii superset (such as utf-16) which is then used
> for the rest of the document.

I agree the initial draft did make it sound like the xml-decl was always 
ASCII.  I think Roger has now fixed this.

Pete.
--
=============================================
Pete Cordell
Codalogic
for XML Schema to C++ data binding visit
 http://www.codalogic.com/lmx/
=============================================




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS