Lists Home |
Date Index |
From: "james anderson" <firstname.lastname@example.org>
> is there a collection of documents available which exercises unicode?
> something more than the few cases in the conformance suite?
> something which mixes texts which would have distinct
> (eg language-specific) 16-bit encodings?
Well, better than nothing is the Chinese XML Now! test suite at
This has small samples in in various encodings (UTF-8, Big5, GB2312),
correcly labelled and sent over MIME with different types.
One difficulty that people have with character sets for exotic scripts,
is that it can be difficult to know when something prints out that it is correct.
So these tests try to clearly only test one thing at a time. For example,
<?xml version="1.0" encoding="UTF-8" ?>
<name>Chinese Test #8: UTF-8</name>
<data>This file has 1 Chinese character, directly entered.</data>
<data>The XML header of this file is <?xml encoding="UTF-8"?>.</data>
<data>The character is here: [中] It is the Chinese character for middle.
It should look like a box with a vertical line through its middle.</data>
so there is only one non-ASCII character involved. (I don't know whether
this example will get through by mail!)