OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: binary base64 definition



On Sat, Jan 06, 2001 at 06:15:24PM -0500, Jerry Johns wrote:
>Following the suggestion of describing exactly what I'm trying to do, here
>it is:
>
<snip />
>
>One method I'm considering is converting the JPEG file to base64, which
>results in a string of text characters, as you know. However, this data
>could also contain special characters, ie: greater-than symbol, less-than
>symbol, etc. 
>
>Assuming this was a good approach, I have two issues remaining: 1) how to
>code and decode the base64 and 2) can I prevent the DOM API from parsing the
>encoded JPEG data and converting greater-than and less-than symbols into
>"lt;" and "gt;" text strings.

You don't mention the language or tools that you're planning on using
for generating either the XML or the BASE64 encodings.  Whether an
encoder/decoder is available (there are several for Java, though none
in the public part of the JDK) depends on the choice of tools.  I would
be very surprised to find that any language or toolkit that has been
used for network-related technologies lacks an encoder and decoder
(though it might well be hidden, like Sun's in the JDK, or extra, like
IBM's Java implementation).

As for escaping special characters, you've been sadly led astray by the
idea that BASE64 encodes 3 eight-bit characters into four eight-bit
characters, using an alphabet that *could* be expressed in six bits. 
It is not, in fact, a particularly six-bit alphabet (although it does
avoid anything 128 and higher and everything 32 and under).  But it
uses the bottom seven bits to express the (almost) six-bit capable
encoding (it is actually an alphabet of 65 characters; "=" has the
special meaning of padding).  It also permits any "white space
characters," which it ignores (CR, LF, tab, space).  Specifically, the
alphabet is A-Z, a-z, 0-9, +, / (and =).  None of these are problematic
either to XML or to most network protocols using the NVT.  If your tool
set doesn't contain an encoder/decoder, the specification should be
sufficient to develop one.  And just think ... by design, you even get
support for EBCDIC!  Wow!

All of this information is easily available; a net search should soon
lead you to RFC 1521 (section 5.2) (or, for that matter, RFC 1421
section 4.3.2.4, although it doesn't call it "base 64").

Once you've decided to do this, and figured out what your schema is,
you can simply add an 'encoding="BASE64"' attribute, or some such.  You
might want to tell the tool to leave the line breaks alone (and not to
indent), but it shouldn't matter unless you have a tool that can't cope
with the plain old stream (the spec says that lines are no more than 76
characters each).

Amy!
-- 
Amelia A. Lewis          alicorn@mindspring.com          amyzing@talsever.com