[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Ten Years Later - XML 1.0 Fifth Edition?
- From: Rick Jelliffe <rjelliffe@allette.com.au>
- To: "'XML Developers List'" <xml-dev@lists.xml.org>
- Date: Wed, 20 Feb 2008 10:00:45 +1100
Elliotte Harold wrote:
> Len Bullard wrote:
>> It does not seem correct to call it an error when a specification
>> drifts out
>> of sync with another specification, in this case, Unicode. Perhaps
>> corrigenda as a synonym for errata isn't right either (corrections
>> supplied
>> after publication and inserted, or printer's error).
>>
>
> There's more going on here than mere syncing with another version of
> another specification. Rather than simply adding the new characters
> from Unicode 3.0-5.0, this proposal adopts a radically different
> approach to handling Unicode. It switches from explicitly listing all
> the allowed characters to explicitly listing all the forbidden ones.
> This opens the door to many characters that simply shouldn't be used
> in XML names at all such as the musical symbols and undefined
> characters. I do not find this to b a wise choice.
I don't think this is a problem with characters or names or
requirements. It is a failure of version manageability and the
inappropriate application of Draconian error handling. The XML version
in the header is useless and as we see now actively prevents change and
improvement: hence the WG is in effect abandoning it.
Ten years ago, or five years ago, or last year, the W3C Core WG should
have said:
* An XML processor should not attempt to process a document with a
higher major version number but report an error.
* An XML processor should attempt to process a document with a higher
version minor version than the XML processor was designed for; In such a
case, well-formedness error reports should note that the error may be
due to changes in the minor XML version (parsers should note which minor
version they are using when reporting that a document is well-formed.)
This would have given time for software versions to be deployed with
this policy by now, and it would allow us to mark XML 1.0 documents as
XML 1.1 even if they don't use any XML 1.1 features. This takes one of
the stings out of the tail. Explicitly labelling is the answer to almost
all "tricky" markup problems: the version numbers is good, but it needs
to be accompanied by a policy that actually brings out its utility
rather than making it a stumbling block.
I suggest the XML Core WG should adopt a compromise to have our cake and
eat it too.
1) Make the change now to the versioning policy
2) A coarse-grain sieve for fatal reportable name character errors
* Strict and detailed rules for Unicode < U+0100
* For U+0100 to U+FFFF include or exclude characters based on
allowing or disallowing blocks (7bit ranges) which allows very efficient
name checking with a 512 entry table and a mask. (This is perhaps more
coarse-grain than even the 5th Edition!)
* For characters not in the BMP (or surrogates) either allow or
disallow indiscriminately all in names
3) A fine-grain sieve for non-fatal optionally-reportable name
character errors:
These are the detailed rules.
4) Deprecate C1 controls
I think we are in different age than 10 years ago. We now have far
fewer encodings in common use, particularly in Europe: some Linux
distros are still using legacy encodings, but most computing has
switched to Unicode. As long as the critical C1 range is disallowed,
which allows most bad encoding detection, then the naming rules have a
purpose of helping readability, not serving double duty as encoding
problem detectors (which they did very handily for some iso8859-n
encodings.)
The HTML 5 work shows that Draconian error handling is not always
appropriate. Even though I strongly think that bad encoding detection is
important, and certainly that bogus names should not be used, I can see
the benefit in having a coarse sieve for fatal WF errors and a fine
sieve for reportable non-WF errors.
Cheers
Rick Jelliffe
Cheers
Rick Jelliffe
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]