OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Character Entities: An XML Core WG View

[ Lists Home | Date Index | Thread Index ]

Tim Bray wrote:

> John Cowan wrote:
> > >Why not Unicode.org?  It could create short name "aliases" 
> > >of the long name descriptions.

> > Are you really prepared to create short names 

No -- that's why I suggested Unicode.org.  :)

> > (other than ones involving hex digits) for all 95,156
> > characters in Unicode 3.2?  Or even if we leave out the Han
> > and Hangul characters, the 13,791 characters that are left?
> > It is a biiiiiiiiiiiiiiiiiiiig job.
 
> Yes, but it sure would be nice if it were done.  If this were done, 
> I think that a lot of people would be willing to focus support 
> on this and nothing else.  I wonder how much could be automated?  
> Hmm... -Tim

None of the Latin, Greek, or Math used in today's markup should, IMO, be
automated.  Those should come from the XHTML, Docbook, MathML
traditions, as "unified" by David C. & Co.  

As to the rest, the writing groups are, well, different -- especially as
to case, letters, characters, vowel signs, intent, etc.  A few random
samples from UnicodeData.txt:
	
	BOX DRAWINGS RIGHT LIGHT AND LEFT VERTICAL HEAVY
	RECYCLING SYMBOL FOR TYPE-4 PLASTICS
	UPWARDS HARPOON WITH BARB LEFT BESIDE DOWNWARDS HARPOON WITH
BARB RIGHT

	CYRILLIC CAPITAL LETTER GHE WITH UPTURN

	ARABIC LETTER DAL WITH DOT BELOW AND SMALL TAH
	ARABIC LIGATURE FEH WITH KHAH WITH MEEM INITIAL FORM

	SINHALA LETTER MAHAAPRAANA PAYANNA
	SINHALA VOWEL SIGN KOMBUVA HAA AELA-PILLA

	TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA
	TIBETAN SUBJOINED LETTER CHA
	TIBETAN VOWEL SIGN REVERSED II
	TIBETAN SIGN NYI ZLA NAA DA

	HANGUL CHOSEONG CEONGCHIEUMSSANGCIEUC
	HANGUL JUNGSEONG SSANGARAEA
	HANGUL LETTER KAPYEOUNSSANGPIEUP
	PARENTHESIZED HANGUL MIEUM A

You might possibly automate *some* of it group by group.  A lot of them
don't seem to yield very well to "entification", automatic or otherwise.
:) And the alternative underscore trick could cause too many to end it
all with an
&UPWARDS_HARPOON_WITH_BARB_LEFT_BESIDE_DOWNWARDS_HARPOON_WITH_BARB_RIGHT
;.

It's probably best to start with a single unified western set from
XHTML, Docbook, and MathML that people can bring in -- *if they desire*
-- and ten years or so from now, we'll rarely need it (or any other
entified Unicode) anyway.


/Jelks





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS