[
Lists Home |
Date Index |
Thread Index
]
Forwarded from Rick Jeliffe...
-----Original Message-----
From: Rick Jelliffe [mailto:ricko@allette.com.au]
> For documentation, Derek cites (in his order significant or not):
>
> 1. Allowed characters
Hi! I don't think Derek's stated comments on allowed characters
are correct.
He says that XML only allows letters and digits in names, which
is not so. He says that there have been lots of Asian characters
missed out, which is not so. And he says that XML required
checking for correct surrogate pairs, which is not so.
At least, all those were incorrect for XML 1.0 when it was introduced.
There were no surrogates in Unicode at that time. Subsequently
Unicode has added characters outside the Basic Multilingual Plane,
including some obscure and regional Asian characters, and
XML 1.1 has grown to keep up with them.
XML 1.1 also allows transmission of control characters (except NULL)
and has simpler tables for checking names. So the complaints about
The only things complaints I have heard about XML names (especially
in IDs) is that they don't allow starting digits or arbitrary
strings. On the other hand, that makes them easier to use as tokens
in languages (JavaScript for example). But there is XML Schemas
at the high end and Schematron at the low end which are perfectly
capable of validating arbitrary keys and references.
At ISO DSDL, we have been sitting around waiting for someone to make
up a small key/reference/integrity schema language that would
allow validation and graph-building without requiring full XML Schemas
validation. The absense of such a thing (e.g. something that
could be used to create an ID list next to a DOM as a SAX insert)
suggest either that it is not a pressing need, or that people
are feckless and lazy, or that XML Schemas and Schematron are
adequate, or that IDs are enough, or that the people that need
extended references are using DBMS which already provide
integrity checks and this is a requirement coming from
people who want to ramp up "native" XML DBMS.
Of course, people don't like XML 1.1 because it is supposedly
syntactically incompatible with XML 1.0 (for control characters which
some major processors in some major Western encodings would fail on
anyway, and so were not reliably interoperable) though infoset
compatible.
Is it consistent to say "Lets adopt a completely different
syntax!" at the same time as saying "We will not use XML 1.1
because it has a restricted syntax for C1 controls and that
is UNTHINKABLY INCOMPATIBLE"?
As another matter, Derek mentions spurious whitespace nodes. But if
using a DTD (and validating parser) these nodes will not
be generated. So it is not a problem with XML, but a trade-off
you get with using simple XML. Indeed, even with no-DTD XML, people
can avoid spurious whitespace nodes simply by not autoindenting
their generated XML. XML Schemas and RELAX NG also provide
enough information that a vendor could use to configure their
XML processors to strip out insignificant whitespace.
As far as what are XML's top five problems, here's my list:
1) Needs adjusted conformance levels: no-DTD, or DTD+validating.
Current validating/non-validing/standalone levels are confusing and
don't promote guaranteed transmission of information.
2) Needs to reserve ISO standard entity names with ISO meanings,
so that no-DTD processors can be used in the publishing industry.
Without this, people still need DTDs, and so there is not enough
marginal utility in adopting other grammar-based schemas such
as RELAX NG or XML Schemas.
3) Need to have namespace-aware DTDs. Even just to allow that
@xmlns and @xmlns:* do not need to be declared in the DTD
would be a giant step forward. There is no problem from the
ISO side for doing a change like this for DTDs (there is even
a work item for this in ISO DSDL), it just requires some
action from the W3C. (It would be useful for schema adoption, too,
because it would provide an better on-ramp for people to
transition to namespaces, and also could provide a terser
delivery mechanism for validation/ID=IDREF annotation
by converting from schema to DTD)
4) Needs xml:space="strip" for use with no DTD.
5) W3C needs to endorse ISO Schematron. (The final ISO draft will
be available this week, by the way: I will announce a link
hosted at schematron.com, which is a site I am currently
preparing for ISO Schematron implementation news)
6) Whingers who dissipate real opportunities for change.
You know what that old post of mine on "XML 2.0 alpha" was
about? That people were disccussing whether we needed
comments when XML Schemas was in the offing. Like that movie
CON-AIR where the little girl is playing innocently as the
child molester approaches. I certainly think it is time
for XML 2.0, but to remove specific problems with the existing
syntax, not to reduce the infoset or adopt some different
syntax or disenfranchise publishing people further.
Cheers
Rick Jelliffe
|