[
Lists Home |
Date Index |
Thread Index
]
Rick Jelliffe writes:
>One good argument against text normalization is that the APIs just
>don't exist. (Putting ICU aside, and waiting like a bride at the altar
>for Java 1.5) However, the normalization APIs don't exist because the
>libraries are based on Unicode circa version 3.0. So saying we cannot
>have text normalization because the libraries don't exist is really
>tieing us to obsolescent Unicode versions.
Please pardon my ignorance on this matter, as a brief look at the
Unicode site hasn't helped.
Is there a machine-readable list of character sequences for
normalization that is updated from version to version? I can find
normalization corrections, and the enormous but not very comprehensible
Derived Normalization Properties, but I don't see a single list of
pathways.
Normalization doesn't seem all that different from some of the work I'm
doing in character entities, and it seems like a declarative list of
normalization sequences would make it a lot easier for us to forget
about specific APIs and write normalizers which keep up with Unicode.
Any thoughts on this? I'd be happy to do some of this work, if it isn't
already there.
--
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com -- http://monasticxml.org
|