[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Blueberry/Unicode/XML
- From: "HUGHES,MARK (Non-HP-FtCollins,ex1)" <mark_hughes@non.hp.com>
- To: xml-dev@lists.xml.org
- Date: Wed, 11 Jul 2001 19:27:47 -0400
>From: Jonathan Borden [mailto:jborden@mediaone.net]
>Perhaps I might paraphrase this by suggesting that we define was is not
>allowed in a name rather than what is. At the very least the set of
>characters not allowed in a name are those needed to prevent ambiguity
>(whitespace,">,)|=*+"). Consider the element:
><O'Hara> shrug </O'Hara>
>Well, all my current documents would remain well-formed so I
>don't see how
>allowing this would adversely affect me. I could always refuse
>to read such
>new fangled documents just as I refuse to read HTML email :-)
Right. If XML is to be modified for this, it should be done once
and only once. Allowing all but syntax-specific characters is the
simplest and best way to do that, with the minimum amount of argument
over what is and is not a "reasonable" character to put in a name.
Your set needs to be changed a bit, though. In regexp:
[^ \t\r\n<>?!=/]
Explanation:
[^...]: All but these characters
" \t\r\n": the definition of S (whitespace)
"<>": Start and end angle brackets; IMHO <<>foo</<> is NOT a good idea,
even though it would be unambiguous to a program.
"?!": Needed for processing instructions, comments, doctypes, etc.
"=": Necessary to separate attribute names from values.
"/": Necessary to distinguish end tags.
You don't need to exclude [,)|*+], as far as I can figure...
By the way, someone brought up the case of Klingon, which didn't get
its own Unicode script because it has a Latin equivalent... But the
apostrophe is an essential character in Klingon, so you can't currently
write Klingon XML markup. Given that there are more Klingon users and
programmers than some of the omitted scripts (there's even a Klingon
programming language, var'aq), this is obviously a dire flaw in XML.
-- <a href="http://kuoi.asui.uidaho.edu/~kamikaze/"> Mark Hughes </a>