[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Here is how to convert an XML document to a different encoding
- From: "Costello, Roger L." <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Wed, 26 Dec 2012 20:02:19 +0000
Hi Folks,
The characters in this XML document are encoded using UTF-8:
<?xml version="1.0"?>
<Name>López</Name>
Its encoding can be changed to another encoding using this simple XSLT program:
---------------------------------------------------
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml"
encoding="Shift_JIS"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
---------------------------------------------------
The encoding attribute on <xsl:output> specifies the desired encoding. The rest of the XSLT program simply performs an identity copy operation.
Shift_JIS is the character encoding for the Japanese language.
iso-8859-1 is a superset of ASCII. It consists of 191 characters (ASCII has 128 characters). It contains the characters for most Western European languages.
I applied the XSLT program to the above XML document, specifying encoding="iso-8859-1" and then encoding="Shift_JS"
Then, using a hex editor I was able to see, at the byte level, the changes that were made to the XML document's encoding.
---------------------------------------------
encoding="utf-8"
L ó p e z
4C C3 B3 70 65 7A
Two bytes (C3 B3) used to encode ó
---------------------------------------------
encoding="iso-8859-1"
L ó p e z
4C F3 70 65 7A
One byte (F3) used to encode ó
---------------------------------------------
encoding="Shift_JIS"
L & # x f 3 ; p e z
4C 26 23 78 66 33 3B 70 65 7A
ó is converted to a character reference
---------------------------------------------
Very cool!
For more info, see this excellent article: http://www.opentag.com/xfaq_enc.htm#enc_conv
/Roger
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]