OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] ANN: XQEngine 0.61

[ Lists Home | Date Index | Thread Index ]

Howard Katz wrote:
> All my word breaking is delegated to a class called (surprise) WordBreaker,
> which implements a very simple algorithm that uses Java's
> Character.isLetterOrDigit() function to determine where words begin and end.
> This works well for Western languages. If you want to optimize for a
> non-Western language, you can override WordBreaker and implement word
> breaking in whatever way makes sense for your particular language or
> languages of interest. That's the theory at any rate ...

Have a look at java.text.BreakIterator, which helps to implement
line and word breaking along the Unicode standards (most notably
UTR14).

J.Pietschmann





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS