OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] ANN: Gorille 0.3

[ Lists Home | Date Index | Thread Index ]

John Cowan 'scripted it'
> > At least in UTF-8 you can just count bytes <0x80 to count characters.
> Make that 0xC0.

No, it's easy, but not quite that easy.

Unicode code-points up to U+007F are represented as 8-bit bytes with the
same value, so counting bytes <0x80 gives you the number of US-ASCII
characters.  Characters above U+007F are represented with multiple bytes,
the first >= 0xC0, the trailing bytes are all >= 0x80.

So to count characters, one way is to count all bytes less <= 0x7F or >=


Rob Lugt
ElCel Technology


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS