[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: binary base64 definition
- From: Ian Graham <igraham@ic-unix.ic.utoronto.ca>
- To: Danny Vint <dvint@slip.net>
- Date: Sat, 06 Jan 2001 19:13:59 -0500 (EST)
On Sat, 6 Jan 2001, Danny Vint wrote:
> At 06:15 PM 1/6/2001 -0500, Jerry Johns wrote:
> >Thanks much for your input. You guys are a tremendous resource.
> >
> >Following the suggestion of describing exactly what I'm trying to do, here
> >it is:
> >
> >I'm trying to implement an interface in XML. The file has to contain some
> >dollar amounts, which is straightforward. The file also needs to contain
> >several "objects" that pertain to the dollar amounts. These objects include
> >an HTML web page and JPEG images. My trading partner will use the dollar
> >amounts for business logic and to store in database. The HTML and JPEG
> >objects will be loaded into an imaging system.
> >
> >I could easily go with the approach of putting the JPEG images as separate
> >files and FTP them along with the XML file. However, I was striving to keep
> >everything in a single file.
>
>
> There are some new messaging specs being worked on that would allow you to
> use something like multipart MIME to wrap all the pieces together - but
> this is real early work going on there.
>
> This has also been a problem in the SGML days as well but basically it
> wasn't as big of a deal because everything we pretty much on the file
> system with documents and it was only when you were exchanging the
> information that you would want to create a single file.
One option would be to grab some code that already handles MIME
multipart/form-data -- then you could send the messages as an XML document
plus non-xml MIME 'attachments'. I imagine that there is apache server
codebase that does this for you. There is even a URI specification for
referencing the 'parts' of the message. Kinda messy though.
> >
> >Obviously, the JPEG file contains special characters when looked at on a
> >byte-by-byte basis.
> >
> >One method I'm considering is converting the JPEG file to base64, which
> >results in a string of text characters, as you know. However, this data
> >could also contain special characters, ie: greater-than symbol, less-than
> >symbol, etc.
>
> base64 has always been recommended for doing what you want and from what I
> understand it actually guarantees that you won't end up with any
> troublesome characters.
Indeed -- base64 encoding eliminates the less/greater than and ampersand
charcters, leaving safe ascii.
> >
> >Assuming this was a good approach, I have two issues remaining: 1) how to
> >code and decode the base64 and 2) can I prevent the DOM API from parsing the
> >encoded JPEG data and converting greater-than and less-than symbols into
> >"lt;" and "gt;" text strings.
YOu;'d also have to escape the ampersands ... I think base64-ing the whole
thing is likely the easiest to do. As Danny notes, there are lots of
base64 encoders/decoders out there - I am sure a search would find you
a perl or java package to do this.
> 1) I belive there are some Perl modules for doing this, the description of
> "how to do" base64 is in one of the RFCs which I might be able to find.
>
> 2) CDATA sections are another way to go, in these all you have to worry
> about is a string of ']]>' - not sure if that helps any. But with a CDATA
> section the parser is hands off except for that particular string. So this
> wouldn't be a problem for the DOM, you might have problems reading it in
> and then writing a CDATA section back out but you could use a standard tag
> <base64> and always read and write a CDATA section around its content.
>
> 1) You will have to extend your DOM implementation to be able to recognize
> the format and hook some code in to handle it - you won't find this off the
> shelf but I would think it would be relatively straight forward to
> implement. Probably the safest way and allows you to maintain one file.
... but you can probably find base64 encoders/decoders to make things
easier for you.
> 2) It isn't DOM you are fighting it is XML and its parser - CDATA is the
> closest thing to doing this but it isn't a complete solution, using an
> entity reference and an external file is the safe way from a parser/DOM
> standpoint but you have the problem of multiple files.
Yes ... the mime mechanism I mention is a whole other layre of messiness
that you avoid by encoding stuff and putting it inside the XML.
> >
> >Can you please validate my approach and assumptions? Thanks! Jerry
>
> base64 is usually the first recommendation for doing what you want, the
> cost is having to build the tool to do the work. To remove the work you can
> use the entity method but then your stuck with multiple files. No magic
> bullet here in XML, you just get a standard way of addressing the worlds
> problems - but their still problems.
I agree -- going the base64 route is I think the easiest approach, but
there is no magic bullet that makes it trivial.
Ian
> ..dan
>
> >
> >-----Original Message-----
> >From: Danny Vint [mailto:dvint@slip.net]
> >Sent: Saturday, January 06, 2001 4:28 PM
> >To: Jerry Johns; 'Ian Graham'
> >Cc: 'xml-dev@lists.xml.org'
> >Subject: RE: binary base64 definition
> >
> >
> >Notations and other formats have always been application dependant since
> >SGML days. In that arena we were primarily trying to use graphics and
> >usually just display them. So the SGML editors provided a mechanism to map
> >a tool to a format. Seems like you might be able to do something with OLE
> >on windows for the same sort of functionality. I'm not sure what was being
> >conveyed about the DOM support, but it would seem like there would be a way
> >to hook into knowing what the NOTATION type was (base64 in this case) and
> >launching some other application to deal with it.
> >
> >Maybe if you describe what the actually need for the base64 is we might be
> >able to offer suggestions along other lines.
> >
> >..dan
> >
> >
> >At 03:31 PM 1/6/2001 -0500, Jerry Johns wrote:
> >>What if I ditched DOM and used another tool for managing the XML file;
> >could
> >>I then insert the base64 content and still be within the XML standards? Is
> >>this a limitate of DOM? Thanks. Jerry
> >>
> >>-----Original Message-----
> >>From: Ian Graham [mailto:igraham@ic-unix.ic.utoronto.ca]
> >>Sent: Saturday, January 06, 2001 10:50 AM
> >>To: Dan Vint
> >>Cc: Jerry Johns; 'xml-dev@lists.xml.org'
> >>Subject: Re: binary base64 definition
> >>
> >>
> >>
> >>The DOM supports access to notation nodes, but can enforce no statement
> >>aobut the proper encoding of a referenced external entity (which makes
> >>sense, as it is external to the document).
> >>
> >>Base64 encoding of content inside a document would require custom code for
> >>doing the encoding/decoding, and some attribute-based mechanism for
> >>labeling the 'type' content of the node containing the data. That is
> >>certainly possible, but as far as I can see is outside the scope of the
> >>DOM.
> >>
> >>Ian
> >>
> >>
> >>On Fri, 5 Jan 2001, Dan Vint wrote:
> >>
> >>> You can't use elements this way, but an alternative would be to create a
> >>> NOTATION type and then an external entity of this type - you would copy
> >>all
> >>> the contents of whatever should be base 64 into this external file, it
> >>> would be part of the XML document but it would be outside. Not sure if
> >>> DOM has been setup to understand NOTATIONS, but in an SGML world you
> >would
> >>> be able to associcate a "processor" with that notation and have it called
> >
> >>> whenever you needed to read or write that format.
> >>>
> >>> Your DTD might look like the following:
> >>>
> >>> <!DOCTYPE .... [
> >>>
> >>> <!NOTATION base64 SYSTEM "binary base64">
> >>> <!ENTITY extfile1 SYSTEM "extfile.b64" NOTATION "base64" >
> >>>
> >>> ]>
> >>> ....
> >>> &extfile1;
> >>> ...
> >>>
> >>> The syntax is probably not exact but you can look up the details.
> >>>
> >>> ..dan
> >>>
> >>> >
> >>> > In the DTD, can I specify a type of "binary base64" for an element so
> >>that
> >>> > when I write to the XML file using DOM, DOM will automatically encode
> >>the
> >>> > binary data for me without parsing for control characters? If so, I
> >>assume
> >>> > it will do the reverse when I read that element.
> >>> >
> >>> > Can anyone validate my assumption about the DTD data type? Has anyone
> >>seen
> >>> > an example DTD definition with this in it?
> >>> >
> >>> > Thanks much.
> >>> > Jerry
> >>> >
> >>>
> >>>
> >
> >---------------------------------------------------------------------------
> >Danny Vint
> >http://www.dvint.com
> >
> >Author: "SGML at Work"
> >http://www.slip.net/~dvint/pubs/sgmlatwork.shtml
> > mailto:dvint@usa.net
> >
>
> ---------------------------------------------------------------------------
> Danny Vint
> http://www.dvint.com
>
> Author: "SGML at Work"
> http://www.slip.net/~dvint/pubs/sgmlatwork.shtml
> mailto:dvint@usa.net
>
>