OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (char)0 handling proposal

Thanks for your comments Richard,

> >The big problem with this approach is how to encode characters which
> >were already in the range 0x7F - 0x9F... it might not happen often,
> >but a bijective mapping (ie reversible) needs to be able to handle
> >all cases!
> There are a number of other ranges that might be used.
> The Unicode Private Use Area (codes E000 - F8FF) is an obvious
> choice.  There's also the "Control Pictures" area 2400-241F
> which has the remarkable property that given a complete Unicode
> font the control characters would actually be readable!

But you get the same problem - you might need to encode any character
(which in Java is 2 byte Unicode), and so shifting to some other range
means that you can no longer use it to encode that range...  I suppose
the argument might be that if someone is using these these areas, then
it *really is* binary data, and so then one would switch to a binary 

I think this is a nice and logical solution - do as you suggest, and map
the control character to the Private Use Area or other; and if you encounter
character values in that range already, only then switch the binary.

The downside is in performance: you need to pre-parse the String to check
for such unusual values before you can write anything.

e:  bren@mail.csse.monash.edu.au                    v:  +61 (3)  9905 1502
Email is checked daily                              Phone is rarely attended