[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "Binary XML" proposals
- From: "Anders W. Tell" <opensource@toolsmiths.se>
- To: Murali Mani <mani@CS.UCLA.EDU>
- Date: Wed, 11 Apr 2001 11:15:17 +0200
I think there are several more projekt out there including one of mine "GREN"
which is a BinaryML based on groves. which is basically XML Information
items.
A few points on speed,:
* most BinML should be faster than any TextML since there are usually fewer
parsing steps involved.
Examples: StringToInt conversion is not neccessary, string sizes are know.
* In Write Once-Read Many usecases it is possible to further increase spead
and reduces space req. by multipass compression teqniques.
* Parsers written in C or C++ are even faster due to fast memory access.
Space:
* In most cases BinML is smaller but not always:
- text represenation of array of doubles is smaller when the array
contains small integers
* I have used several compression teqniques and in usecases when the DTD is
"well-known" to both the writer and the reader the size can be significantly
reduced.
* There exits many classes of BinML parsers thar are significantly smaller
than corresponding TextML ones.
* Large documents with a many and different tags are usually smaller when
using BinML.
Currently Im working on version 2 and implementing a protoype (when I have
spare time) whith the goal of creating a BinML version of JAR files ( Java
.class files).
Ive just discovered that its is possible to save 3-5MB (out of 13MB) by
removing redundant information, so it should be possible to "add markup and
reduce size" !
Hopefully this remains true when the prototype is done :)
/anders
Anders W. Tell
Financial Toolsmiths AB
Murali Mani wrote:
> I am aware of a few research projects that used a binary representation of
> XML. I think the main things with binary XML are compressed data exchange
> and storage and I think they reported faster processing than ordinary DOM.
> I shall list the 3 that I have heard of. The last I heard of them was
> almost a year ago.
>
> 1. PDOM - Persistent DOM - They do not do any extra compression other than
> the compression due to binarization. The main point here is the DOM is not
> fully memory resident - it does partial loading.
> 2. Millau - They separate structure and content and do compression. Mainly
> it is for exchange, and I think they also got to a very primitive DOM
> support. This was in WWW9, and was a research project at IBM.
> 3. XMill - This is from AT & T and University of Pennsylvania. I think
> this achieved the most compression. Idea was similar to Millau's but in
> addition, for values they used something called "column-based compression"
> -- I think compression and decompression overhead as well as header
> overhead is larger here.
>
> I also think the space and time savings by the different projects were
> something like:
> Millau - claimed storage decreased 5 times. also claimed something like
> 20% savings in processing the DOM. (recollections).
> XMill - claimed storage decreased only for large documents - documents
> should be at least 64 kB (recollections).
> But I am sure Millau claimed storage and processing got better.
>
> regards - murali.
>
> On Tue, 10 Apr 2001, Al Snell wrote:
>
> > On Tue, 10 Apr 2001, Tim Bray wrote:
> >
> > > So, Sean may have used strong language, but in point of fact
> > > he was correct, so it's forgivable. Get some data on how
> > > much space and time a binary representation will save, then
> > > you'll be able to make intelligent quantitative decisions
> > > on where it's worthwhile deploying it.
>
> ------------------------------------------------------------------
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@lists.xml.org