Lists Home |
Date Index |
I sent this to the public blueberry-coments address, but
thought some of them might usefully be discussed here. If
someone wants to start an argument about one or more of
these, please pull it out and give it a separate subject
1. The principle of decoupling the XML spec from successive
revisions of Unicode is the only sensible way forward.
2. If no consensus can be built around the details of this
set of changes, it would be acceptable to declare defeat and
go on with XML 1.0 2nd ed as-is. This would be a regrettable
outcome but not fatal at a deep level.
3. Issue 18: The costs of allowing #x1-#x1F appear to me to
exceed the benefits. Among other things, many of these
ASCII control chars, despite being several decades old, have
little consensus concerning their semantics, e.g. EOT and EOM
(#x3 and #x4). I think from the XML point of view these things
are actively pernicious; specifically the notion that semantics
are embedded in characters rather than being expressed by markup.
The case of "textual content that may contain such characters
(but typically does not)" is pretty non-convincing. In *many*
cases the occurrence of these characters is evidence of an error.
4. Issue 21: The cost of allowing null bytes in XML content is
very high and the benefits hard to understand.
5. I strongly feel that #x85 (NEXT LINE) should not be added to
the S production. The reason is a simple cost-benefit analysis;
the proportion of computing installations where this is an issue
is not large and is shrinking as a proportion of the
infrastructure. Supporting this change imposes significant
conversion costs on the rest of the world; the total global
net cost would be significantly less if the mainframe software
infrastructure took the necessary corrective measures to deal
with XML 1.0 as specified.
6. I strongly feel, even more so than in the case of #x85,
that #x2028 is inappropriate for inclusion in S. Here are
- If LINE SEPARATOR is to be included, why not the many
other Unicode characters with spacing semantics? A
coherent explanation needs to be provided on this
point and I am unconvinced that one exists.
- This would be the only core XML syntax character that
can't fit in a byte. This would complicate several
automaton-driven parser construction strategies. One
of the key design goals of XML is to make programmers'
lives simpler, so this objection should have weight.
- "For completeness" is a really flimsy argument.
7. In , #x37a is included, which is a combining
character and shouldn't be in NameStart
8. In , #xf7 is included (division sign), but the
rest of the mathematical operators (starting at
#x2200) are excluded.
9. The inclusion of a block #x202A-#218f is kind
of puzzling... it starts in the middle of one of the
punctuation blocks, and the first few chars seem
really unsuitable. What's the intent... wanting to
include the currency symbols? This definitely
needs some explanation.
10. There are some problems in the #x2800-#xD7FF block.
Do we really want CJK radicals (#x2e80...), compatibility
Jamo, ideographic description chars, and so on?
11. SHould that block end at #xD7aF or #xD7FF?
12. [#xFDE0-#xFFEF] includes the private use area and lots
of compatibility characters which XML 1.0 actually
deprecates for use at all, let alone as names. This
is astounding and needs some defense. If this is OK,
why not throw in all the punctuation?
13. What's wrong with ASCII digits as name start chars, given
that all sorts of other digits are going in?
14. There really needs to be some deep discussion in this
document of why this alternative was chosen. When I
look at some of the wildly unlikely things that are
allowed to appear in names, the obvious question is:
Why not rely on the Unicode properties database. In
particular, this allows lots of Name characters that
are not in fact Unicode characters at all and probably
never will be.
15. Issue 11:
I can see both sides of this question. My intuition is
that the computational cost of doing this is unacceptably
high for high-throughput applications of XML, but we need
some research to establish if this is the case. If it can
be done cheaply and compactly, it's probably a good idea.