OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Character Entities: An XML Core WG View

[ Lists Home | Date Index | Thread Index ]
  • To: "'xml-dev'" <xml-dev@lists.xml.org>
  • Subject: RE: [xml-dev] Character Entities: An XML Core WG View
  • From: "Jelks Cabaniss" <jelks@jelks.nu>
  • Date: Fri, 1 Nov 2002 01:31:50 -0500
  • Importance: Normal
  • In-reply-to: <3DC206EA.9050408@textuality.com>

Tim Bray wrote:

> I see your point, but there are all these people out there
> who keep saying they want a way to give funny characters
> human-readable names and don't want to use elements because
> they think structure and content are different.  No matter
> how many times they are told that they shouldn't really need
> the names and that if they did they should use elements,
> they keep refusing to take our word for this, so we're gonna
> have to do something.  Sigh.

> The WG's approach does at least have the virtue that it works with 
> existing software.  

Indeed.

> I despise entities in general more and more with each passing year, 
> but it's pretty clearly character entities that are the bit that 
> just won't go away; I seem to recall weeping with James Clark over 
> this into our 18th or 19th glasses of red wine at the last XML 
> conference.

Because they don't round trip after parsing?  Or because of having to
expand the entities before you can use them?

> I know I don't when I'm in rdhead or oweenie mode - &#xbabe; does the 
> job fine - 

It does, but &#xnnn;'s scattered throughout a document are hard to
proof.  That's the only reason people want names (and not as
elements!:).
 
> but people who want to edit XML by hand really want to be able to use 
> &euro; and the like.

Yes.  In fifteen or so years, when purely ASCII/ANSI/ISO-* editors are
history, I doubt if anyone will care, but I don't see the point in axing
the internal subset at this point in time.  I'm not sure I see the point
of axing it in the future either.

> Once again, sigh.  I haven't seen a better idea, but one would be 
> welcome.  Hmm, has anyone suggested
> 
> &#uCYRILLIC-CAPITAL-LETTER-TSE; (aka &#x426;) or
> &#uPARTIAL-DIFFERENTIAL; (aka&#x2202;)

Again, why exactly -- except for "round-tripping" -- is a huge built-in
Unicode character reference database (that changes with every rev of
Unicode) better than having the convenience of being able to declare
&Tse; and the few others you might want in the internal subset?  


/Jelks





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS