[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Fw: [xml-dev] Encodings and how they're specified
- From: Hermann Stamm-Wilbrandt <STAMMW@de.ibm.com>
- To: xml-dev@lists.xml.org
- Date: Tue, 5 Jul 2011 18:33:09 +0200
Forgot to add xml-dev ...
----- Forwarded by Hermann Stamm-Wilbrandt/Germany/IBM on 07/05/2011 06:32
PM -----
From: Hermann Stamm-Wilbrandt/Germany/IBM
To: David Carlisle <davidc@nag.co.uk>
Date: 07/05/2011 06:03 PM
Subject: Re: [xml-dev] Encodings and how they're specified
> b) in the absence of external http header information, using the bom and
> or the first few bytes encoding "<?xml" it can figure out how (most
> likely) the ascii range of characters are encoded and that's good enough
> to be able to read the encoding declaration and fix up the encoding once
> you have read that.
I would agree "normally", but ebcdic encoding is different.
So this is how I create an "ebcdic-de" encoded XML file
(uconv is like iconv, but part or ICU library distribution)
(ebcdic.xml.txt is Non-XML as encoding declaration and actual encoding
differ)
So an XML processor/parser should be able to deal with ebcdic.xml and
correctly
determine its "ebcdic-de" encoding, right?
$ cat ebcdic.xml.txt
<?xml version="1.0" encoding="ebcdic-de"?>
<ebcdic>123</ebcdic>
$
$ uconv -f utf-8 -t ebcdic-de ebcdic.xml.txt >ebcdic.xml
$
$ od -Ax -tx1 ebcdic.xml
000000 4c 6f a7 94 93 40 a5 85 99 a2 89 96 95 7e 7f f1
000010 4b f0 7f 40 85 95 83 96 84 89 95 87 7e 7f 85 82
000020 83 84 89 83 60 84 85 7f 6f 6e 25 4c 85 82 83 84
000030 89 83 6e f1 f2 f3 4c 61 85 82 83 84 89 83 6e 25
000040
$
$ cat ebcdic.xml; echo
Lo���@�������~�K�@��������~������`��on%L������n���La������n%
$
Mit besten Gruessen / Best wishes,
Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
From: David Carlisle <davidc@nag.co.uk>
To: Joe Fawcett <joefawcett@hotmail.com>
Cc: xml-dev@lists.xml.org
Date: 07/05/2011 04:40 PM
Subject: Re: [xml-dev] Encodings and how they're specified
On 05/07/2011 15:22, Joe Fawcett wrote:
> , I still don't see how it manages to read the encoding mentioned in
> the XML declaration with only the BOM available?
you are verging off list, but
a) lt's specified here
http://www.w3.org/TR/2008/REC-xml-20081126/#sec-guessing
and
b) in the absence of external http header information, using the bom and
or the first few bytes encoding "<?xml" it can figure out how (most
likely) the ascii range of characters are encoded and that's good enough
to be able to read the encoding declaration and fix up the encoding once
you have read that.
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]