OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: XML Blueberry

>From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu]
>Sent: Friday, 22 June, 2001 12:34
>It's an idea, but probably won't fly. First off many of the most 
>familiar of those high characters are likely to be mathematical and 
>musical symbols. Secondly, some of them will probably be combining 
>characters that really can't serve as a name start character. Thirdly 
>new characters will continue to be added to existing blocks in the 
>basic multilingual plane (BMP) below 65536 to fill in holes in 
>existing blocks so we'll also need to deal with those.

  Is there any particular reason to forbid them in some future version
of the spec (NOT BigBlueberry, but when we have good reason to make a
new one, perhaps to cut out some unused and unusable DTD 'features' and
include namespaces)?  The rec says:
S ::= (#x20 | #x9 | #xD | #xA)+
STag ::= '<' Name (S Attribute)* S? '>'
ETag ::= '</' Name S? '>'

  So as long as it's terminated by S or '>', almost anything *could* be
allowed for Name.  Given that that's how many parsers "cheat" anyway, it
will already pass in the implementations (that's how my quick & dirty
XML parser works...), and it's just a matter of changing the spec to
match reality.  Remember "rough consensus and running code"?  If a spec
demands too much, the implementations are going to ignore it anyway and
do as much as they feel like.

  Actually, you'd have to forbid '=' and '/' as well, to allow
attributes to have names and distinguish STag and ETag.  I think that's
it, though.  Everything else is fair game.  Name ::= [^${S}>/=] looks
pretty cool to me.

  F'rinstance, if I'm writing documents in Solresol, which uses musical
notes instead of letters, then why shouldn't I be able to have those as
element names?  What's particularly wrong with combining characters as
name start chars?  Sure, it's strange and silly, but it simplifies the
spec.  I know there's a vested interest for spec authors to write more
and bigger specs, but real programmers have exactly the opposite
interest.  Just think - this could be the first W3C spec that ever got

  It looks to me like the restricted NameChar is just SGML legacy cruft.
I don't see how it improves the format.

  In my crankier moments, I'd like to just leave out NEL out of spite
for IBM's and Mr. Cowan's strident attempts to break XML compatibility
for the sake of their obsolete and anti-compatible mainframes, but I'll
be nice - it really *would* be easy enough to add the rest of the
Unicode space characters to S, but *not* just their NEL.  If it's to be
done, it had best be done right, not just to satisfy one company's

  It is certainly not appropriate to do it for just that trivial change;
it would need to be part of XML 2.0 or some such.

-- <a href="http://kuoi.asui.uidaho.edu/~kamikaze/"> Mark Hughes </a>