OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] ANN: Gorille 0.3

[ Lists Home | Date Index | Thread Index ]

From: "Richard Tobin" <richard@cogsci.ed.ac.uk>

> >The nasty fact is that
> >I suspect many Java application programmers will end up
> >simply blowing off non-BMP text either through ignorance
> >or based on a decision that it's not cost-effective.
> It depends what they want to do with it.  Won't they just end up
> passing it through as pairs of surrogates?

And also, do surrogate pairs really introduce any issues that
are not already present in combining character sequences?

I have been going through this recently for our markup editor.
For the first version, we have decided to not-barf-but-not- 
provide-support-for combining character sequences
or surrogates, because the 1 Java char = 1 glyph assumption 
makes life very easy.  

Using IBM's Internationalization Classes for Unicode
(bulk kudos to Mark Davis), it is quite straightforward
to add normalization to data import and character
entry in an interactive application. This means that
your application uses combined characters where they
are available rather than combining character sequences.
For most Western Latin languages, Unicode provides
pre-combined characters: enough even to support
Vietnamese with multiple levels of accent. 

The other issue here is that 1 Java char = 1 glyph 
assumption does not imply that every character is
the same width: if you support proportional width 
characters you can still support Chinese and Japanese.

The W3C I18n WG has a new version of their "Character
Model for the WWW" at http://www.w3.org/TR/
which is looking pretty good.  It is really well written
and anyone who wants to get a grip on internationalization
or character issues should find it a good place to start. 

Rick Jelliffe
Topologi Pty. Ltd.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS