XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Here is how to convert an XML document to a different encoding

Hi Folks,

The characters in this XML document are encoded using UTF-8:

<?xml version="1.0"?>
<Name>Lˇpez</Name>

Its encoding can be changed to another encoding using this simple XSLT program:
---------------------------------------------------
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
                           version="1.0">
    
    <xsl:output method="xml"
                         encoding="Shift_JIS"/>
    
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>
---------------------------------------------------

The encoding attribute on <xsl:output> specifies the desired encoding. The rest of the XSLT program simply performs an identity copy operation.

Shift_JIS is the character encoding for the Japanese language. 

iso-8859-1 is a superset of ASCII. It consists of 191 characters (ASCII has 128 characters). It contains the characters for most Western European languages.

I applied the XSLT program to the above XML document, specifying encoding="iso-8859-1" and then encoding="Shift_JS"

Then, using a hex editor I was able to see, at the byte level, the changes that were made to the XML document's encoding.

---------------------------------------------
encoding="utf-8"
  L      ˇ      p   e   z
4C C3 B3 70 65 7A

Two bytes (C3 B3) used to encode ˇ 
---------------------------------------------
encoding="iso-8859-1"
 L    ˇ   p   e   z
4C F3 70 65 7A

One byte (F3) used to encode ˇ 
---------------------------------------------
encoding="Shift_JIS"
  L   &   #   x    f   3    ;    p   e   z
4C 26 23 78 66 33 3B 70 65 7A

ˇ is converted to a character reference
---------------------------------------------

Very cool!

For more info, see this excellent article: http://www.opentag.com/xfaq_enc.htm#enc_conv  

/Roger


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS