Lists Home |
Date Index |
- To: "'xml-dev'" <firstname.lastname@example.org>
- Subject: RE: [xml-dev] Character Entities: An XML Core WG View
- From: "Jelks Cabaniss" <email@example.com>
- Date: Fri, 1 Nov 2002 01:31:50 -0500
- Importance: Normal
- In-reply-to: <3DC206EA.firstname.lastname@example.org>
Tim Bray wrote:
> I see your point, but there are all these people out there
> who keep saying they want a way to give funny characters
> human-readable names and don't want to use elements because
> they think structure and content are different. No matter
> how many times they are told that they shouldn't really need
> the names and that if they did they should use elements,
> they keep refusing to take our word for this, so we're gonna
> have to do something. Sigh.
> The WG's approach does at least have the virtue that it works with
> existing software.
> I despise entities in general more and more with each passing year,
> but it's pretty clearly character entities that are the bit that
> just won't go away; I seem to recall weeping with James Clark over
> this into our 18th or 19th glasses of red wine at the last XML
Because they don't round trip after parsing? Or because of having to
expand the entities before you can use them?
> I know I don't when I'm in rdhead or oweenie mode - 몾 does the
> job fine -
It does, but &#xnnn;'s scattered throughout a document are hard to
proof. That's the only reason people want names (and not as
> but people who want to edit XML by hand really want to be able to use
> € and the like.
Yes. In fifteen or so years, when purely ASCII/ANSI/ISO-* editors are
history, I doubt if anyone will care, but I don't see the point in axing
the internal subset at this point in time. I'm not sure I see the point
of axing it in the future either.
> Once again, sigh. I haven't seen a better idea, but one would be
> welcome. Hmm, has anyone suggested
> &#uCYRILLIC-CAPITAL-LETTER-TSE; (aka Ц) or
> &#uPARTIAL-DIFFERENTIAL; (aka∂)
Again, why exactly -- except for "round-tripping" -- is a huge built-in
Unicode character reference database (that changes with every rev of
Unicode) better than having the convenience of being able to declare
&Tse; and the few others you might want in the internal subset?