xml-dev - Re: [xml-dev] Attributes as tags in namespaces and how to guess characte

Re: [xml-dev] Attributes as tags in namespaces and how to guess characte

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Attributes as tags in namespaces and how to guess characterencoding
From: "Thomas B. Passin" <tpassin@comcast.net>
Date: Wed, 21 Jul 2004 23:35:54 -0400
In-reply-to: <40FEA12D.6020700@minimum.se>
References: <40FEA12D.6020700@minimum.se>
User-agent: Mozilla Thunderbird 0.7.2 (Windows/20040707)

Martin Olsson wrote:

> --- QUESTION 2
> 
> XML files can use different character encodings including UNICODE and 
> normal ascii text files. An XML parser must know what encoding is used 
> before it starts to process the file, loading a UNICODE file is very 
> different from loading a normal text file. The parser can obviously not 
> first read the encoding attribute of the XML declaration which is the 
> first line of the XML file and then load the file. 

On the contrary, the xml declaration is entirely in ascii except for a 
possible byte order mark, so the processor can determine 8-bit vs. 
16-bit encodings from the BOM and the <?xml, and then read the encoding 
declaration, knowing that it is in ascii.

> ... Should the XML parser use a brute force approach and try all of these?

THe only problem would come if the actual encoding does not match the 
declared encoding (I am leaving aside those cases where the processor 
knows the encoding by some other means).  The processor is not expected 
to sort out such discrepancies.

XML is one of the few formats out there that can handle multiple 
encodings and unicode decently, and much of this is due to the xml 
declaration.

Cheers,

Tom P

-- 
Thomas B. Passin
Explorer's Guide to the Semantic Web (Manning Books)
http://www.manning.com/catalog/view.php?book=passin

Follow-Ups:
- RE: [xml-dev] Attributes as tags in namespaces and how to guess character encoding
  - From: "Michael Kay" <michael.h.kay@ntlworld.com>

References:
- Attributes as tags in namespaces and how to guess character encoding
  - From: Martin Olsson <mnemo@minimum.se>

Prev by Date: Re: [xml-dev] Attributes as tags in namespaces and how to guess characterencoding
Next by Date: RE: [xml-dev] Attributes as tags in namespaces and how to guess character encoding
Previous by thread: Re: [xml-dev] Attributes as tags in namespaces and how to guess characterencoding
Next by thread: RE: [xml-dev] Attributes as tags in namespaces and how to guess character encoding
Index(es):
- Date
- Thread