OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] A heavier-weight proposal for character entitydefinition

[ Lists Home | Date Index | Thread Index ]

On Wed, 2002-02-06 at 20:36, James Clark wrote:
> Interesting.  Those are compelling use cases but this significantly 
> complicates things.  In particular, automatically using entities on output 
> becomes much more complicated.  Instead of a simple hash table that maps 
> character codes to entities, you have to have a trie.  I also see a 
> slippery slope opening up here:
> 1. single character
> 2. base character + combining character(s)/other Unicode modifier (MathML)
> 3. arbitrary sequence of characters (why limit 2? don't want to check 
> character types)
> 4. arbitrary well-formed content (3 allows arbitrary text, and for I18N 
> arbitrary text needs elements for eg BIDI and ruby)
> Not clear what the right place to draw the line is here.

Drawing the line at (3) seems okay to me - that permits lexical
substitution at any point in the processing.  The tree does become a
problem at some point, but I suspect combining characters and surrogates
will force us there anyway.

Ents doesn't presently support trees, though it can (hackishly) support
multiple characters.  Something to work on...
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS