OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Binary Data in XML : Turning back the clock

[ Lists Home | Date Index | Thread Index ]
  • From: Tyler Baker <tyler@infinet.com>
  • To: Paul Prescod <papresco@technologist.com>
  • Date: Wed, 30 Sep 1998 10:37:01 -0400

Paul Prescod wrote:

> "Samuel R. Blackburn" wrote:
> >
> > A couple of weeks ago on this list, there was a thread that was
> > lamenting the slow adoption of XML in the web community.
> >
> > It seems to me that one of the first problems programmers
> > encounter is XML's inability to handle "binary" data. Once they
> > hit that wall, they drop XML and move on to something else
> > (usually a custom format).
>
> First, binary data is not a wall. It's at most a gate. There are several
> ways to handle it, none of them particularly onerous. My favourite is
> "tar".
>
> Second, recall that binary junk is what we are running away from.
> Consider:
>
> <ms:word xml:length="10000 bytes"></ms:word>
>
> Yuck! I will rue the day I crash "vi" or "more" by looking at an XML
> document.
>
> I think that it is a much better practice to have the XML document contain
> only human-readable, human-editable text and LINKS to necessarily
> non-readable stuff. I suppose I would make an exception for streaming
> processes that want to interleave tags and data: base64 handles this fine.

Base64 only increases the size of the data transmission by around 33%.  You could
in fact use something other than Base64 which has a conversion ratio of 8/7
instead of 8/6.  This should not be a major penalty considering that networks
these days are generally fast and costs are low.  Furthermore, if efficiency is
ever a question, you can either first encode your binary data into some
compressed binary format before applying base64 encoding to it, or else you could
compress the entire XML stream in the first place (what I would recommend).

One major pain in the you know what with EDI is that for BIN segments you have to
deal with the stream in 8-bit binary format while the rest of the stream could be
transmitted in 7-bit ASCII.  Also for languages which have an idea of multi-byte
character sets, I/O can be a pain (as well as inefficient) if you cannot assume
that you are working with just a character stream since you will have to in
essence need to process everything one byte at a time (are we in a character
stream or binary stream?).

For the XML Framework that I hope to release in the next week or so depending on
how long it takes to finish up on the documentation, base64 is handled natively
in both the parser and the formatter.  Someone please correct me if I am wrong,
but I think this is one service SAXON provides as well.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS