OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Binary Data in XML

[ Lists Home | Date Index | Thread Index ]
  • From: Tyler Baker <tyler@infinet.com>
  • To: Tim Bray <tbray@textuality.com>
  • Date: Wed, 30 Sep 1998 12:43:14 -0400

Tim Bray wrote:

> Suppose I wrote up a NOTE, should occupy less than one page, proposing
> a reserved attribute xml:packed with, for the moment, only two
> allowed values, "none" and "base64".  The default value is "none".
> If an element has xml:packed="base64" this means that
>
> (a) the content of the element to which this is attached must be
>     pure #PCDATA, no child elements and no references, and
> (b) the content is encoded in base64, leading and trailing spaces allowed

Why would the content have to have no child elements or references?  I can see
how this constraint would make things simpler for the parser and avoid some of
the obvious confusion with mixed content models.

> This obviously couldn't retroactively become part of XML 1.0, but
> if it went through a process and became a W3C recommendation, I bet
> every parser author in the world would support it in about 15 minutes.
>
> Base64 (a 4-for-3 encoding) wastes 33%, so I thought about perhaps
> inventing Base128 (8-for-7) or maybe even a higher level to cut down
> wasteage, but Base64 has the advantage that it avoids UTF8/ISO-8859
> confusion and I bet Mr. LZW will eat that 33% anyhow...

Something I still wonder about is whether UNISYS is still playing patent rights
games with LZW.

> I also thought about xml:encoding=, but that conflicts with
> encoding= in the XML declaration in a confusing way.

You could have something like xml:type="base64" which opens the door for more
efficient processing by the XML parser for data primitives before presenting them
to the application.  So you could have something like:

xml:type="int".  It would be up to the parser to marshal the data in big-endian,
little-endian or whatever format the host operating system uses.  Right now, most
parsers and parser interfaces provide attribute values and character content only
in String form.  If Mathematica were to do anything useful with XML and a Java
interface (with a lower-level native kernel of course) it would first have to
parse the data as a String.  This is inefficient as Mathematica would only care
to have the data as a number in the first place.  If the character content cannot
be parsed into a number, throw an exception or return an error code.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS