OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] xml documents which exercise unicode

[ Lists Home | Date Index | Thread Index ]

From: "james anderson" <james.anderson@setf.de>

> is there a collection of documents available which exercises unicode? 
> something more than the few cases in the conformance suite? 
> something which mixes texts which would have distinct 
> (eg language-specific) 16-bit encodings?

Well, better than nothing is the Chinese XML Now! test suite at

This has small samples in in various encodings (UTF-8, Big5, GB2312),
correcly labelled and sent over MIME with different types.

One difficulty that people have with character sets for exotic scripts,
is that it can be difficult to know when something prints out that it is correct.
So these tests try to clearly only test one thing at a time. For example,
http://www.ascc.net/xml/test/wf/utf-8/text_plain/zh-utf8-8.txt is

 <?xml version="1.0" encoding="UTF-8" ?> 
<test type="io8">
  <name>Chinese Test #8: UTF-8</name> 
  <data>This file has 1 Chinese character, directly entered.</data> 
  <data>The XML header of this file is <?xml encoding="UTF-8"?>.</data> 
  <data>The character is here: [δΈ­] It is the Chinese character for middle. 
   It should look like a box with a vertical line through its middle.</data> 

so there is only one non-ASCII character involved.  (I don't know whether
this example will get through by mail!)

Rick Jelliffe


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS