[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] Why is Encoding Metadata (e.g. encoding="UTF-8") put Inside the XML Document?
- From: David Carlisle <davidc@nag.co.uk>
- To: costello@mitre.org
- Date: Thu, 20 Sep 2007 15:02:58 +0100
>
> An XML Parser will make an initial "guess" of the encoding based upon
> the presence or absence of a Byte Order Mark (BOM). The XML parser then
> interprets the bit strings using that guess up to the first ">"
> character (the end of the XML declaration).
>
If the encoding isn't known in advance then (in theory) you don't know
where the first > is (as you don't know how > is encoded)
> Now that it knows the "real" encoding it interprets the rest of the
> document using the encoding it found in the XML declaration.
That still makes it sound as if the encoding declaration is read using a
different encoding from the rest of the document. Once an encoding has
been determined then the encoding declaration line itself must be
consistent with that encoding. You can't use one byte per character
ascii
<?xml version="1.0" encoding="utf-16"?>
and then read the rest of the file using two (or four) bytes per
character.
Suppose I have an encoding "my-encoding" that's the same as as ascii
except that > and < are swapped round. then the following is a well
formed document
>?xml version="1.0" encoding="my-encoding"<
>foo<hello>/foo<
The parser knows it's been handed an xml file, can tell that it's not
going to parse as utf8 so there must be an xml declaration, so the first
tfew bytes must encode "<?xml" it sees the bytes it sees and the only
encoding it knows about in which that sequence encodes "<?xmlis the
"my-encoding" encoding so proceeds on that basis, which means it
successfullt finds encoding="my-encoding" and knows all is well...
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]