Lists Home |
Date Index |
This is all scaring me a bit ... just as I was thinking that dealing with
text was easier than bit twiddling! I may need to scurry back into debugging
DCOM protocols.... perhaps I will just keep using Java chars in blissful
ignorance until some terrible calamity occurs :-)
> Elliotte Rusty Harold wrote:
> > It could be worse, though. You could be using C, and trying to decode
> > UTF-8. :-)
> ?? It's about 10 lines of code, and has been written lots of
> times now. Last time I needed it I couldn't find one with the
> exact buffer interface I needed so I coded it up from scratch
> sometime in the course of an afternoon and it worked first time.
> The spec is hardly unclear. And it's a set of shift/mask
> operations that are processor-friendly. You need to use a
> loop iterator rather than a for (i = 0; string[i]; i++) idiom,
> big deal.
> UTF8 only really causes extra work when you want per-character
> addressing into big strings, because then you need an indirect
> table - the most common case I can think of is maintaining
> on-screen render state.
> But in most apps it's more common to point into text at a
> few places (tags, word-starts, search matches) in which case
> you needed that indirect array anyhow.
> Conclusion: somewhat to my surprise, I find that for a lot
> of C tasks, you can keep your text in UTF-8 and work with
> it that way very efficiently.
> Elliote is right about the irritating fact that a Java
> "char" isn't an XML character. The nasty fact is that
> I suspect many Java application programmers will end up
> simply blowing off non-BMP text either through ignorance
> or based on a decision that it's not cost-effective. -Tim
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>