OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] ANN: XQEngine 0.61

[ Lists Home | Date Index | Thread Index ]

I'd looked at BreakIterator way back when it was still at Taligent. I can't
recall why I chose not to go with it at the time (efficiency concerns?), but
it looks worth revisiting. Thanks for the suggestion.
Howard

-----Original Message-----
From: J.Pietschmann [mailto:j3322ptm@yahoo.de]
Sent: Sunday, December 07, 2003 2:00 AM
To: Howard Katz; xml-dev@lists.xml.org
Subject: Re: [xml-dev] ANN: XQEngine 0.61


Howard Katz wrote:
> All my word breaking is delegated to a class called (surprise)
WordBreaker,
> which implements a very simple algorithm that uses Java's
> Character.isLetterOrDigit() function to determine where words begin and
end.
> This works well for Western languages. If you want to optimize for a
> non-Western language, you can override WordBreaker and implement word
> breaking in whatever way makes sense for your particular language or
> languages of interest. That's the theory at any rate ...

Have a look at java.text.BreakIterator, which helps to implement
line and word breaking along the Unicode standards (most notably
UTR14).

J.Pietschmann


-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS