[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8") put Inside the XML Document?
- From: "Pete Cordell" <petexmldev@tech-know-ware.com>
- To: "David Carlisle" <davidc@nag.co.uk>, <costello@mitre.org>
- Date: Thu, 20 Sep 2007 16:46:45 +0100
----- Original Message From: "David Carlisle" <davidc@nag.co.uk>
>> Now that it knows the "real" encoding it interprets the rest of the
>> document using the encoding it found in the XML declaration.
>
> That still makes it sound as if the encoding declaration is read using a
> different encoding from the rest of the document. Once an encoding has
> been determined then the encoding declaration line itself must be
> consistent with that encoding.
For me, the above statement isn't correct. If an XML document starts out
with:
<?xml version="1.0" encoding="iso-8859-2"?>
The parser will analyse the first character by reading up to 4 bytes of
input (as described in the algorithm mentioned). In this case it will work
out that the first character corresponds to the single byte ASCII code for
'<'. On that basis, it will assume that it is UTF-8. It will then proceed
to read the rest of the XML decl and on interpreting the encoding attribute
will revise it's guess to be iso-8859-2.
In general, having guessed UTF-8 (variable number of bytes per character),
the encoding attribute could change it to the various Latin character sets
(iso-8859-* - single byte, but having values 0-255), or something like
Shift-JIS which uses an escape sequence to escape out of the ASCII plane.
Pete.
--
=============================================
Pete Cordell
Codalogic
for XML Schema to C++ data binding visit
http://www.codalogic.com/lmx/
=============================================
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]