OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Possible changes for XML 2nd Edition

[ Lists Home | Date Index | Thread Index ]
  • From: Rick JELLIFFE <ricko@geotempo.com>
  • To: xml-editor@w3.org, "xml-dev@xml.org" <xml-dev@xml.org>
  • Date: Thu, 25 May 2000 04:04:41 +0800

John Cowan wrote:
 
> Issue PE28:
> 
> Currently the XML Recommendation is silent about the handling of
> documents that contain "impossible" bytes.  For example, the byte 0xFF
> cannot appear in any UTF-8 encoded document.  We are considering making
> such violations of the encoding a fatal error.
> 
> PRO: an improperly encoded document is not really a text document at all;
> nothing should be done on the basis of it.  XML's draconian error handling rule
> should lead to a "fatal error", which means the rest of the document must
> not be parsed.
> 
> CON: Some parsers may be relying on libraries supplied by the OS, which may
> not properly signal erroneous input.  Is it too great a burden on the
> parser implementor to impose this restriction?
 
I think this goes too far, for basic WF.

Instead, I would propose another level of validity "character validity"
which XML processors should be encouraged, but not required, to support,
or to support as much as they can. Unlike validity, which sits on top
of well-formedness, "character validity" sits more-or-less underneath
well-formedness as XML's soft underbelly.

An XML document that was "character valid" would
 1) not have any impossible bytes in any entity
 2) not have a BOM if the encoding="utf16le" or "utf16be" (and any other
encoding constraints)
 3) all names in markup must follow the NAMECHAR conventions.
 4) all data Unicode-normalized

This would keep a basic XML implementation that did not support
"character
validity" simple:
 1) it can use any library for transcoding
 2) it does not have to have any special BOM handling for utf16xe
 3) it can tokenize tags based on whitespace and delimiters rather than
NAMECHAR or NAMESTRT
 4) normalization not checked/enforced

A character-validating processor should be the goal for any XML
processor
not specifically aimed at ultra-lightweight uses.


Rick Jelliffe

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS