OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Contracts & Acceptence Testing. Re: IE5 and UTF-8

[ Lists Home | Date Index | Thread Index ]
  • From: Rick JELLIFFE <ricko@geotempo.com>
  • To: Lucio Piccoli <Lucio.Piccoli@one2one.co.uk>
  • Date: Fri, 07 Jul 2000 17:03:21 +0800

Lucio Piccoli wrote:

> I am having a problems with a supplier that sending XML docs that fail to be
> parsed by JAXP due to UTF-8 encoding errors. The supplier claims that docs
> have been parsed by IE5 before, hence it validates that the XML is good.

If you need to be able to pin down the specific encoding problem, some
extra info would be helpful:
 - Can you tell us what the particular UTF-8 encoding error is? 
 - Could you open the same document in IE5? 
 - when you say "send" do you mean over HTTP? if so, are the senders
setting the correct charset parameter in the HTTP header.  If not, does
the document have no encoding declaration or one that explictly says

If you can capture the data and send a hex dump (if you have a GNU/UNIX
system you can use 
"od -tcxC filename") of the offending fragment, that would be useful.

SGML was developed in order to clarify where responsibility for
correcting errors belongs: receivers can acceptence test the data. XML
inherits this.

For contracts, you should specify the validation tool for acceptance
testing. I know of at least one customer who bought OmniMark solely to
use for validation before delivery of their SGML data, even though they
used their own tools; for XML it will be the same. 

For example, your contract could say something like (in Legalese):

"i) Documents must conform to the requirements of ISO International
Standard 8879:1986 (SGML) as corrected to 1999. ii) The particular form
of SGML is required to be that profile specified by W3C as XML version
1.0 as corrected, as an "Additional Requirements" for SGML (reference to
James Clark's SGML declarations for XML document at W3C). iii) The data
encoding is required to be UTF-8, as specified by the Unicode
Consortium, as corrected.  iv) The meaning of characters is required to
be that specified by ISO Interntional Standard 10646 (Universal
Character Set) and Unicode Consortium in Unicode Character Set 3.0 as

These requirements will be deemed satisfied by:
 <insert some reference XML processor that you have confidence in,
including version and which platform--the version of Java should be
specified too>
 and <insert some encoding test program, you may have to write it>

You could also put in whether the documents should be well-formed or
valid, against which DTD, and which program will be used to validate
it.  If you are really serious, you should specify that valid data or WF
data should pass 3/3 different parsers.  If your data format has
constraints that cannot be represented in DTDs, then also require some
schema language such as SOX or Schematron or XML Schemas (which should
have mature-enough implementations by 2001).

If you send binary data, or data in other formats, the same thing is
required. This is particularly important if you send in poorly
standardized formats such as CGM or RTF or binary files including GIF.
If data is sent in archive or compression or encryption formats, the
same approach should be required."

Rick Jelliffe

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/

  • References:
    • IE5 and UTF-8
      • From: Lucio Piccoli <Lucio.Piccoli@one2one.co.uk>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS