XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: Fw: [xml-dev] Encodings and how they're specified

Hermann Stamm-Wilbrandt scripsit:

> So an XML processor/parser should be able to deal with ebcdic.xml and
> correctly determine its "ebcdic-de" encoding, right?

"Should" is too strong.  Many, if not most, XML parsers will not
understand this encoding, though in that case they should successfully
reject it.  Appendix F explains how to identify a generic EBCDIC XML
document by looking for the "4C 6F A7 94" bytes with which it must
begin, though it is still necessary to read through the encoding
declaration in order to determine the exact flavor of EBCDIC in use.
The invariant character set (00640) can be used to decode the specified
encoding name, unless the encoding is code page 290, which does not have
lower-case Latin letters anyway.

http://recycledknowledge.blogspot.com/2005/07/hello-i-am-xml-encoding-sniffer.html
gives a detailed algorithm.

-- 
Note that nobody these days would clamor for fundamental laws        John Cowan
of *the theory of kangaroos*, showing why pseudo-kangaroos are   cowan@ccil.org
physically, logically, metaphysically impossible.    http://www.ccil.org/~cowan
Kangaroos are wonderful, but not *that* wonderful.     --Dan Dennett on zombies


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS