[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: (char)0 handling proposal
- From: Brendan Macmillan <bren@mail.csse.monash.edu.au>
- To: derhoermi@gmx.net
- Date: Sat, 18 Aug 2001 16:03:21 +1000 (EST)
> >"Now is the time for all good men to come to the aid of the party@@@@@@@@@"
> >"@" is the null char - when a String is *mostly* text, it would be nice to
> >render the readable text as human readable...
> Create a new simpleType quotedPrintable, then you can have
>
> "Now is the time for all good men to come to the aid of the
> party=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00"
>
> where the string is converted to UTF-16LE before applying QP. This is as
> human readble as possible. But please note that wouldn't be a very
> interoperable solution and I discourage such multi-level encodings.
I agree about multi-level encodings, but it does seem the only way to cater for
both human consumption and binary data.
> If it's binary don't use XML (directly) or use the mentioned types. Who
> cares about human consumption _and_ uses binary data?
It's because the "char" datatype of Java is ambivalent. It usually contains
Unicode, but it can also be treated as a 16 bit unsigned integer.[*]
More and more I think you are right, that if a String does contain non-text
values (by the XML definition), then it should be treated entirely as binary.
Incidentally, a way to serve both the concerns of human consumption and binary
data is to render binary in this strangely familiar format:
<Binary>
0000000: 4e6f 7720 6973 2074 6865 2074 696d 6520 Now is the time
0000010: 666f 7220 616c 6c20 676f 6f64 206d 656e for all good men
0000020: 2074 6f20 636f 6d65 2074 6f20 7468 6520 to come to the
0000030: 6169 6420 6f66 2074 6865 2070 6172 7479 aid of the party
0000040: 0000 0000 0000 0000 000a @@@@@@@@@.
</Binary>
This kind of format is *the most* human readable way to present binary data.
It can be edited effectively via the hex representation, and the text
representation is "read-only" (a kind of markup of the real data). The
addresses on the left are a non-XML markup - but this could be done in an XML
style, eg:
<bin addr="0000000"> 4e6f 7720 6973 2074 6865 2074 696d 6520 </bin>
or
<b a="0000000" t="Now is the time ">4e6f 7720 6973 2074 6865 2074 696d 6520</b>
(Based on an idea by Mark Collette, for using hex to represent binary in XML)
Cheers!
Brendan
--
e: bren@mail.csse.monash.edu.au v: +61 (3) 9905 1502
Email is checked daily Phone is rarely attended
[*]
As the XML definition of "text" grows more important, it would be nice if
languages had a primitive datatype for "textchar" or "XMLchar". This avoids
the need to check the range of values it contains. But I guess there are many
reasons to use primitives that are a multiple of 8 bits in length (exception:
boolean, but it doesn't require extra validation checks.)