Re: [xml-dev] Ten Years Later - XML 1.0 Fifth Edition?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Rick Jelliffe <rjelliffe@allette.com.au>
To: "'XML Developers List'" <xml-dev@lists.xml.org>
Date: Wed, 20 Feb 2008 10:00:45 +1100

Elliotte Harold wrote:
> Len Bullard wrote:
>> It does not seem correct to call it an error when a specification 
>> drifts out
>> of sync with another specification, in this case, Unicode.  Perhaps
>> corrigenda as a synonym for errata isn't right either (corrections 
>> supplied
>> after publication and inserted, or printer's error).
>>
>
> There's more going on here than mere syncing with another version of 
> another specification. Rather than simply adding the new characters 
> from Unicode 3.0-5.0, this proposal adopts a radically different 
> approach to handling Unicode. It switches from explicitly listing all 
> the allowed characters to explicitly listing all the forbidden ones. 
> This opens the door to many characters that simply shouldn't be used 
> in XML names at all such as the musical symbols and undefined 
> characters. I do not find this to b a wise choice.
I don't think this is a problem with characters or names or 
requirements. It is a failure of version manageability and the 
inappropriate application of Draconian error handling. The XML version 
in the header is useless and as we see now actively prevents change and 
improvement: hence the WG is in effect abandoning it.

Ten years ago, or five years ago, or last year, the W3C Core WG should 
have said:
  * An XML processor should not attempt to process a document with a 
higher major version number but report an error.
  * An XML processor should attempt to process a document with a higher 
version minor version than the XML processor was designed for; In such a 
case, well-formedness error reports should note that the error may be 
due to changes in the minor XML version (parsers should note which minor 
version they are using when reporting that a document is well-formed.)

This would have given time for software versions to be deployed with 
this policy by now, and it would allow us to mark XML 1.0 documents as 
XML 1.1 even if they don't use any XML 1.1 features. This takes one of 
the stings out of the tail. Explicitly labelling is the answer to almost 
all "tricky" markup problems: the version numbers is good, but it needs 
to be accompanied by a policy that actually brings out its utility 
rather than making it a stumbling block.

I suggest the XML Core WG should adopt a compromise to have our cake and 
eat it too.

 1) Make the change now to the versioning policy

 2) A coarse-grain sieve for fatal reportable name character errors
     * Strict and detailed rules for Unicode < U+0100
     * For U+0100 to U+FFFF include or exclude characters based on 
allowing or disallowing blocks (7bit ranges) which allows very efficient 
name checking with a 512 entry table and a mask. (This is perhaps more 
coarse-grain than even the 5th Edition!)
     * For characters not in the BMP (or surrogates) either allow or 
disallow indiscriminately all in names

  3) A fine-grain sieve for non-fatal optionally-reportable name 
character errors:
    These are the detailed rules.

 4) Deprecate C1 controls
      I think we are in different age than 10 years ago. We now have far 
fewer encodings in common use, particularly in Europe: some Linux 
distros are still using legacy encodings, but most computing has 
switched to Unicode. As long as the critical C1 range is disallowed, 
which allows most bad encoding detection, then the naming rules have a 
purpose of helping readability, not serving double duty as encoding 
problem detectors (which they did very handily for some iso8859-n 
encodings.)

The HTML 5 work shows that Draconian error handling is not always 
appropriate. Even though I strongly think that bad encoding detection is 
important, and certainly that bogus names should not be used, I can see 
the benefit in having a coarse sieve for fatal WF errors and a fine 
sieve for reportable non-WF errors.

Cheers
Rick Jelliffe

Cheers
Rick Jelliffe

Follow-Ups:
- Re: [xml-dev] Ten Years Later - XML 1.0 Fifth Edition?
  - From: Liam Quin <liam@w3.org>
- RE: [xml-dev] Ten Years Later - XML 1.0 Fifth Edition?
  - From: "Len Bullard" <cbullard@hiwaay.net>

References:
- RE: [xml-dev] Ten Years Later - XML 1.0 Fifth Edition?
  - From: Len Bullard <len.bullard@uai.com>
- Re: [xml-dev] Ten Years Later - XML 1.0 Fifth Edition?
  - From: Elliotte Harold <elharo@metalab.unc.edu>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]