Lists Home |
Date Index |
- From: "Rick Jelliffe" <email@example.com>
- To: <firstname.lastname@example.org>
- Date: Fri, 28 Nov 1997 16:27:50 +1100
> From: Peter Murray-Rust <email@example.com>
> I assume there is no short cut...
On the contrary, there *IS* a short cut: the most obvious one!
Just treat the name as a token (i.e. terminated by whitespace or >,
or any other delimiter if you want to be careful). Any valid XML will
work with just that!
If you want to completely validate your XML, then the more sophisticated
checks are appropriate. The intent (as I see it) is to let people use
customary words in their language and script, if they want to. It is bad
practise to use crazy symbols and uncommon characters in markup, because
the purpose of markup is to reveal meaning, not hide it. The complexity
of the rules merely encodes that to give guidance in the peripheral cases.
> I applaud the work of the WG on the Internationalisation and I don't want
Yes, they have been exemplory in this, I think. They have taken the issue
very seriously, and kept their eyes on the goal. It is very easy for I18N
to bamboozle people, in that there is always a fuzzy and heaving morass of
quibbling that makes people want to give up. But in the case of XML, we
can have our cake (the fans of strict, codified naming rules can exactly
specify what is allowed) *AND* eat it (bewildered parser-writers can just
use simple tokenizing).
> to detract from it. What I would suggest is that because of the extremely
> likelihood of error if individuals do try to hack their own isNameChar(),
> and because if ever this list is revised software will be invalidated, that
> the WG, or W3C or whoever, maintain an isNameChar() routine in the common
It is possible that isNameChar() will be adequate. The issue of how complex
the naming rules should be is under last-minute finalization. The important
thing is not to bee distracted by how detailed the official list is. If
you do not have a validating XML processor (which means you in fact are
assuming that your documents are valid) then a much simpler tokenizing regime
should work fine. That was a thing explicit in the discussions for the
naming system: it must be straightforward to implement a (non-validating)
> (C, C++, Java) so that we know we shall all be working with the same one.
There is a draft ISO technical report on this issue, for future programming
language standards. This technical report has clearly been influenced by
XML and SGML's approaches to the problem. I know that the WG representatives
who are looking after finalizing the naming rules are looking at that
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)