xml-dev - Re: [xml-dev] Microsoft FUD on binary XML...

Re: [xml-dev] Microsoft FUD on binary XML...

[ Lists Home | Date Index | Thread Index ]

Subject: Re: [xml-dev] Microsoft FUD on binary XML...
From: Rick Jelliffe <ricko@allette.com.au>
Date: Fri, 21 Nov 2003 17:25:30 +1100
Cc: xml-dev@lists.xml.org
In-reply-to: <004201c3af99$e9944010$650aa8c0@BOBDEV>
References: <004201c3af99$e9944010$650aa8c0@BOBDEV>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3.1) Gecko/20030428

I don't know why people don't also factor in SGML or microparsing as well as potentially
playing a part; For example.

<dataset/27,12/

15 bytes. SGML users commmonly ended up with less than one character per
tag this way, without leaving the text world. In other words, there are other options
(with XML bindings) more terse than XML before you get to binary.

Also, it would interesting to see binary people use Chinese (Japanese or Korean) text
and markup for their test data. Compressing or packing ASCII is quite different to
compressing or packing UTF-16 Chinese, which has a more random-seeming distribution
of byte values. It is not dishonest to make the case for binary using data that
is most compressible; but businesses who are looking at compression strategies
for world-wide use need to factor in CJK compressability into their evaluations.

Cheers
Rick Jelliffe

Bob Wyman wrote:

Jeff Lowery wrote:

What can be achieved by binary XML that can't 
be similarly achieved using well-known text 
compression algorithms?

 
	Let me give a really trivial example that shows some of the
benefits.
	Let's say that you had the following XML: (warning,
instructive but highly contrived example follows...)

<dataset>
 <x>27</x>
 <y>12</y>
</dataset>

There are about 45 bytes there if you count CRLF and the
pretty-printing spaces. Now, a binary encoding that was schema driven
would look at this and might replace the tags with index numbers which
would be stored as bytes. Thus, dataset becomes "1", x becomes "2" and
y becomes "3." The encoder, if it new that the values of x and y were
supposed to be integers might also replace the two character strings
with one byte integers. If the encoding was of the "tag-length-value"
family, it would first write a tag number to the stream, then a length
and then the value. This would give you the following sequence of 6
bytes for the XML above:
1	Dataset Tag
4	length of data included in Dataset
2	"x" tag
27	value of x
3	"y" tag
12	value of y
 
So, you could encode dataset in 3 bytes:
1	Dataset Tag
27	value of x
12	value of y

Follow-Ups:
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Alaric B Snell <alaric@alaric-snell.com>

References:
- RE: [xml-dev] Microsoft FUD on binary XML...
  - From: "Bob Wyman" <bob@wyman.us>

Prev by Date: Q: xml, xsl and web interface
Next by Date: RE : [xml-dev] Comparison of Xml documents
Previous by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Next by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Index(es):
- Date
- Thread