OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: encoding problem fixed

[ Lists Home | Date Index | Thread Index ]
  • From: John Cowan <cowan@locke.ccil.org>
  • To: XML Dev <xml-dev@ic.ac.uk>
  • Date: Fri, 30 Jul 1999 13:56:49 -0400

David Brownell wrote:

> Actually, that's not correct either.  My general advice is to pass a
> URI to the parser -- which is required to do the correct thing! -- and
> in those rare cases that can't be done:
> 
>     * If the data is externally typed according to character set,
>       you MUST use some Reader ... e.g. given a MIME type of
>       "application/xml;charset=Big5", then use a reader set
>       up to use the "Big5" encoding (a Chinese encoding).  There
>       isn't much choice of classes; InputStreamReader, or a custom
>       reader that understands that encoding.
> 
>     * If the data is NOT externally typed, then you MUST rely on
>       the XML parser's autodetection ... pass an InputStream.

This is all quite sound, and I was wrong to overlook the case of
external charset information.
 
> > Actually, it's doing what it's expected to: reading the native charset,
> > CP-1252.  (Unix JVMs use 8859-1 instead.)
> 
> Those are actually system-specific defaults ... many localized versions
> of those environments work differently.  For example UNIX JVMs may well
> use the "EUC-JP" coding in Japan, or MS-Windows the "Shift_JIS".

Reasonable.
 
> In fact, my own basic guidance is never to pass any sort of I/O stream
> (InputStream -or- Reader!) to the parser; let the parser work from the
> URI, if at all possible.  It's normally quite possible, and it's a lot
> less likely to handle the encodings wrong than application code!!

This leads to an interesting question: what do various XML parsers
do when fetching http: URIs that produce explicit charset declarations?
Someone should try Aelfred, etc. and see if the header-level charset
declaration is respected, overriding the internal encoding declaration.

-- 
	John Cowan	http://www.ccil.org/~cowan	cowan@ccil.org
Schlingt dreifach einen Kreis um dies! / Schliesst euer Aug vor heiliger Schau,
Denn er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
			-- Coleridge / Politzer

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS