[
Lists Home |
Date Index |
Thread Index
]
I see XML 1.1 is out, and it is so crazy that it is funny. My considered
recommendation is we all have a good laugh, and then forget about it.
By allowing any character in names, it means that we can have WF XML 1.1
documents which merely opening in a text editor (even an editor for the
document encoding) will corrupt with a well-formedness error: if people use
characters in names which may be split at by automated line-wrapping. A
markup language which safe practise is to *never* open an entity in a text
editor? Excellent advance!
I would guess that putting in Issue 18 and Issue 21 (should control
characters
be allowed? should 0x00 be allowed?) are just sacrificial lambs, put in to
be removed later but not serious suggestions. A markup language which was
unsafe to store in files or to transmit on serial lines or as text/*?
Should be a winner!
It would be interesting to speculate what principle causes characters to be
considered whitespace: certainly it is not that all visible space should be
whitespace (one sensisble rule) or that only ASCII should be space.
Why is not just mapping NEL to #A on input enough to satisfy the IBM
requirement?
This gives us a markup language in which all markup a WF document could look
by inspection as if every character is ASCII but could not be serialized out
to ASCII. because of NELs or LS characters. Not a common problem, but a
hole.
Another great joke is to "simplify" the naming rules to free a parser from
having to worry about future upgrades to Unicode, but then requiring
Normalized data (and suggesting it should be an error): surely this just
ties the parser to having to know a particular version of Unicode to know
which normalization rules to use!
Of course, the real way to get independence from Unicode changes is to
define name rules in terms of Unicode properties. There is a set of Unicode
properties specifically to be used to determine which characters can be used
in identifier. By allowing more characters in names, the XML WG is not
supporting more of Unicode, but less.
ROFL
Rick Jelliffe
|