OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: binary base64 definition

At 06:15 PM 1/6/2001 -0500, Jerry Johns wrote:
>Thanks much for your input. You guys are a tremendous resource.
>Following the suggestion of describing exactly what I'm trying to do, here
>it is:
>I'm trying to implement an interface in XML. The file has to contain some
>dollar amounts, which is straightforward. The file also needs to contain
>several "objects" that pertain to the dollar amounts. These objects include
>an HTML web page and JPEG images. My trading partner will use the dollar
>amounts for business logic and to store in database. The HTML and JPEG
>objects will be loaded into an imaging system.
>I could easily go with the approach of putting the JPEG images as separate
>files and FTP them along with the XML file. However, I was striving to keep
>everything in a single file.

There are some new messaging specs being worked on that would allow you to
use something like multipart MIME to wrap all the pieces together - but
this is real early work going on there.

This has also been a problem in the SGML days as well but basically it
wasn't as big of a deal because everything we pretty much on the file
system with documents and it was only when you were exchanging the
information that you would want to create a single file.

>Obviously, the JPEG file contains special characters when looked at on a
>byte-by-byte basis.
>One method I'm considering is converting the JPEG file to base64, which
>results in a string of text characters, as you know. However, this data
>could also contain special characters, ie: greater-than symbol, less-than
>symbol, etc. 

base64 has always been recommended for  doing what you want and from what I
understand it actually guarantees that you won't end up with any
troublesome characters.

>Assuming this was a good approach, I have two issues remaining: 1) how to
>code and decode the base64 and 2) can I prevent the DOM API from parsing the
>encoded JPEG data and converting greater-than and less-than symbols into
>"lt;" and "gt;" text strings.

1) I belive there are some Perl modules for doing this, the description of
"how to do" base64 is in one of the RFCs which I might be able to find.

2) CDATA sections are another way to go, in these all you have to worry
about is a string of ']]>' - not sure if that helps any. But with a CDATA
section the parser is hands off except for that particular string. So this
wouldn't be a problem for the DOM, you might have problems reading it in
and then writing a CDATA section back out but you could use a standard tag
<base64> and always read and write a CDATA section around its content.

>At this point, I have these assumptions: 1) I will have to use an external
>tool to encode/decode because DOM will not do this automatically in some way
>and 2) I can prevent DOM from parsing the encoded data by specifying some
>some of parameter instructing it not to parse the content.

1) You will have to extend your DOM implementation to be able to recognize
the format and hook some code in to handle it - you won't find this off the
shelf but I would think it would be relatively straight forward to
implement. Probably the safest way and allows you to maintain one file.

2) It isn't DOM you are fighting it is XML and its parser - CDATA is the
closest thing to doing this but it isn't a complete solution, using an
entity reference and an external file is the safe way from a parser/DOM
standpoint but you have the problem of multiple files.

>Can you please validate my approach and assumptions? Thanks! Jerry

base64 is usually the first recommendation for doing what you want, the
cost is having to build the tool to do the work. To remove the work you can
use the entity method but then your stuck with multiple files. No magic
bullet here in XML, you just get a standard way of addressing the worlds
problems - but their still problems.


>-----Original Message-----
>From: Danny Vint [mailto:dvint@slip.net]
>Sent: Saturday, January 06, 2001 4:28 PM
>To: Jerry Johns; 'Ian Graham'
>Cc: 'xml-dev@lists.xml.org'
>Subject: RE: binary base64 definition
>Notations and other formats have always been application dependant since
>SGML days. In that arena we were primarily trying to use graphics and
>usually just display them. So the SGML editors provided a mechanism to map
>a tool to a format. Seems like you might be able to do something with OLE
>on windows for the same sort of functionality. I'm not sure what was being
>conveyed about the DOM support, but it would seem like there would be a way
>to hook into knowing what the NOTATION type was (base64 in this case) and
>launching some other application to deal with it.
>Maybe if you describe what the actually need for the base64 is we might be
>able to offer suggestions along other lines.
>At 03:31 PM 1/6/2001 -0500, Jerry Johns wrote:
>>What if I ditched DOM and used another tool for managing the XML file;
>>I then insert the base64 content and still be within the XML standards? Is
>>this a limitate of DOM? Thanks. Jerry
>>-----Original Message-----
>>From: Ian Graham [mailto:igraham@ic-unix.ic.utoronto.ca]
>>Sent: Saturday, January 06, 2001 10:50 AM
>>To: Dan Vint
>>Cc: Jerry Johns; 'xml-dev@lists.xml.org'
>>Subject: Re: binary base64 definition
>>The DOM supports access to notation nodes, but can enforce no statement
>>aobut the proper encoding of a referenced external entity (which makes
>>sense, as it is external to the document).
>>Base64 encoding of content inside a document would require custom code for
>>doing the encoding/decoding, and some attribute-based mechanism for
>>labeling the 'type' content of the node containing the data. That is
>>certainly possible, but as far as I can see is outside the scope of the
>>On Fri, 5 Jan 2001, Dan Vint wrote:
>>> You can't use elements this way, but an alternative would be to create a 
>>> NOTATION type and then an external entity of this type - you would copy
>>> the contents of whatever should be base 64 into this external file, it
>>> would be part of the XML document but it would be outside. Not sure if
>>> DOM has been setup to understand NOTATIONS, but in an SGML world you
>>> be able to associcate a "processor" with that notation and have it called
>>> whenever you needed to read or write that format. 
>>> Your DTD might look like the following:
>>> <!DOCTYPE .... [
>>> <!NOTATION base64 SYSTEM "binary base64">
>>> <!ENTITY extfile1 SYSTEM "extfile.b64" NOTATION "base64" >
>>> ]>
>>> ....
>>> &extfile1;
>>> ...
>>> The syntax is probably not exact but you can look up the details.
>>> ..dan
>>> > 
>>> > In the DTD, can I specify a type of "binary base64" for an element so
>>> > when I write to the XML file using DOM, DOM will automatically encode
>>> > binary data for me without parsing for control characters? If so, I
>>> > it will do the reverse when I read that element.
>>> > 
>>> > Can anyone validate my assumption about the DTD data type? Has anyone
>>> > an example DTD definition with this in it?
>>> > 
>>> > Thanks much.
>>> > Jerry
>>> > 
>Danny Vint
>Author: "SGML at Work"  
> mailto:dvint@usa.net

Danny Vint

Author: "SGML at Work"