[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: (char)0 handling proposal
- From: Brendan Macmillan <bren@mail.csse.monash.edu.au>
- To: richard@cogsci.ed.ac.uk (Richard Tobin)
- Date: Fri, 17 Aug 2001 22:46:46 +1000 (EST)
Thanks for your comments Richard,
> >The big problem with this approach is how to encode characters which
> >were already in the range 0x7F - 0x9F... it might not happen often,
> >but a bijective mapping (ie reversible) needs to be able to handle
> >all cases!
>
> There are a number of other ranges that might be used.
> The Unicode Private Use Area (codes E000 - F8FF) is an obvious
> choice. There's also the "Control Pictures" area 2400-241F
> which has the remarkable property that given a complete Unicode
> font the control characters would actually be readable!
But you get the same problem - you might need to encode any character
(which in Java is 2 byte Unicode), and so shifting to some other range
means that you can no longer use it to encode that range... I suppose
the argument might be that if someone is using these these areas, then
it *really is* binary data, and so then one would switch to a binary
rendering.
I think this is a nice and logical solution - do as you suggest, and map
the control character to the Private Use Area or other; and if you encounter
character values in that range already, only then switch the binary.
The downside is in performance: you need to pre-parse the String to check
for such unusual values before you can write anything.
Cheers,
Brendan
--
e: bren@mail.csse.monash.edu.au v: +61 (3) 9905 1502
Email is checked daily Phone is rarely attended