OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (char)0 handling proposal

> >"Now is the time for all good men to come to the aid of the party@@@@@@@@@"

> >"@" is the null char - when a String is *mostly* text, it would be nice to
> >render the readable text as human readable...

> Create a new simpleType quotedPrintable, then you can have
>   "Now is the time for all good men to come to the aid of the
>    party=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00"
> where the string is converted to UTF-16LE before applying QP. This is as
> human readble as possible. But please note that wouldn't be a very
> interoperable solution and I discourage such multi-level encodings.
I agree about multi-level encodings, but it does seem the only way to cater for
both human consumption and binary data.

> If it's binary don't use XML (directly) or use the mentioned types. Who
> cares about human consumption _and_ uses binary data?
It's because the "char" datatype of Java is ambivalent. It usually contains
Unicode, but it can also be treated as a 16 bit unsigned integer.[*]

More and more I think you are right, that if a String does contain non-text
values (by the XML definition), then it should be treated entirely as binary.

Incidentally, a way to serve both the concerns of human consumption and binary
data is to render binary in this strangely familiar format:

0000000: 4e6f 7720 6973 2074 6865 2074 696d 6520  Now is the time 
0000010: 666f 7220 616c 6c20 676f 6f64 206d 656e  for all good men
0000020: 2074 6f20 636f 6d65 2074 6f20 7468 6520   to come to the 
0000030: 6169 6420 6f66 2074 6865 2070 6172 7479  aid of the party
0000040: 0000 0000 0000 0000 000a                 @@@@@@@@@.

This kind of format is *the most* human readable way to present binary data.
It can be edited effectively via the hex representation, and the text
representation is "read-only" (a kind of markup of the real data).  The
addresses on the left are a non-XML markup - but this could be done in an XML
style, eg:

<bin addr="0000000"> 4e6f 7720 6973 2074 6865 2074 696d 6520 </bin>
<b a="0000000" t="Now is the time ">4e6f 7720 6973 2074 6865 2074 696d 6520</b>

(Based on an idea by Mark Collette, for using hex to represent binary in XML)

e:  bren@mail.csse.monash.edu.au                    v:  +61 (3)  9905 1502
Email is checked daily                              Phone is rarely attended

As the XML definition of "text" grows more important, it would be nice if
languages had a primitive datatype for "textchar" or "XMLchar".  This avoids
the need to check the range of values it contains.  But I guess there are many
reasons to use primitives that are a multiple of 8 bits in length (exception:
boolean, but it doesn't require extra validation checks.)