OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Attributes as tags in namespaces and how to guess characte

[ Lists Home | Date Index | Thread Index ]

Martin Olsson wrote:

> --- QUESTION 2
> XML files can use different character encodings including UNICODE and 
> normal ascii text files. An XML parser must know what encoding is used 
> before it starts to process the file, loading a UNICODE file is very 
> different from loading a normal text file. The parser can obviously not 
> first read the encoding attribute of the XML declaration which is the 
> first line of the XML file and then load the file. 

On the contrary, the xml declaration is entirely in ascii except for a 
possible byte order mark, so the processor can determine 8-bit vs. 
16-bit encodings from the BOM and the <?xml, and then read the encoding 
declaration, knowing that it is in ascii.

> ... Should the XML parser use a brute force approach and try all of these?

THe only problem would come if the actual encoding does not match the 
declared encoding (I am leaving aside those cases where the processor 
knows the encoding by some other means).  The processor is not expected 
to sort out such discrepancies.

XML is one of the few formats out there that can handle multiple 
encodings and unicode decently, and much of this is due to the xml 


Tom P

Thomas B. Passin
Explorer's Guide to the Semantic Web (Manning Books)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS