[
Lists Home |
Date Index |
Thread Index
]
On Wed, 2002-02-06 at 20:36, James Clark wrote:
> Interesting. Those are compelling use cases but this significantly
> complicates things. In particular, automatically using entities on output
> becomes much more complicated. Instead of a simple hash table that maps
> character codes to entities, you have to have a trie. I also see a
> slippery slope opening up here:
>
> 1. single character
> 2. base character + combining character(s)/other Unicode modifier (MathML)
> 3. arbitrary sequence of characters (why limit 2? don't want to check
> character types)
> 4. arbitrary well-formed content (3 allows arbitrary text, and for I18N
> arbitrary text needs elements for eg BIDI and ruby)
>
> Not clear what the right place to draw the line is here.
Drawing the line at (3) seems okay to me - that permits lexical
substitution at any point in the processing. The tree does become a
problem at some point, but I suspect combining characters and surrogates
will force us there anyway.
Ents doesn't presently support trees, though it can (hackishly) support
multiple characters. Something to work on...
--
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com
|