[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Request: Techniques for reducing the size of XML instances
- From: Al Snell <email@example.com>
- To: "HUGHES,MARK (Non-HP-FtCollins,ex1)" <firstname.lastname@example.org>
- Date: Wed, 01 Aug 2001 00:47:27 +0100 (BST)
On Mon, 30 Jul 2001, HUGHES,MARK (Non-HP-FtCollins,ex1) wrote:
> gzip is simple, because it's included as a standard library in almost
> every language and platform these days. In Python or Java, for instance,
> you can open a gzip stream and treat it like any other stream. Nobody
> normally reimplements it from scratch, though any programmer worth the
> name should be able to do it from the spec; it's not rocket science, just
> a data structures and pattern matching problem.
On an embedded device - an important market for network-enabled device
software standards - having to implement gzip and an XML parser can
prohitively bring up the cost and bug-potential of a device.
Did the WML people use gzipped XML? No! They used a binary tokenising
system. They wanted speed and simplicity with some bandwidth reduction.
> I did not state that it was cheap on memory; it's not, but it's also
> not a significant factor these days. RAM is dirt cheap by the MB now.
> These are not the '80s any more. Maybe it's time to upgrade your Apple
RAM is not cheap in terms of power consumption, however.
> "deflate is great at compressing the text of content" - BUT MARKUP IS
> TEXT! ROTFL!
The textual encoding of markup is text, yes. A textual encoding of an SQL
database is text, but the database isn't - encoding the markup in a
compact binary format is what I'm talking about.
> I've done that very experiment, and got marginally better results out
> of gzip than I did from pre-encoding the XML.
I then went on to explain that gzipping a binary file was not what I was
talking about :-)
> Binary XML is a dead end, obsolete before it was even started.
> Please stop wasting peoples' time on it.
Excuse me, but who died and made you God? You do *not* speak for
everybody. And you are certainly not speaking for the battery powered or
embedded or high performance computing communities, all of whom are being
unfairly locked out of access to XML infrastructure without sensible
encodings of XML information, and are all to inventing their own
non-interoperable binary formats right now.
If you think binary XML is pointless, then please explain why so many
companies are working on proprietary solutions?
- XML databses are almost all binary XML systems, IIRC.
- The MPEG7 group produced a binary XML format.
- The WML group produced a binary XML format.
On the open standards side:
- The ISO and ITU-T are producing a binary XML format.
- I run an announcements and discussion list for people working on binary
XML formats to exchange notes and keep abreast of each other's work
- Various open source applications that deal with XML but are required to
exhibit high performance have produced their own encodings and have
implicitly released the spec or interface libraries
> Again, this is an area where you're looking back to Elder Days...
> Most new embedded systems now have pretty decent CPUs, certainly fast
> enough to run gzip if they have to trade off a little speed and RAM to
> make up for a slow network pipe.
BUT THERE ARE THOSE THAT HAVE DIFFERENT REQUIREMENTS!
Not just embedded systems. High performance systems as well.
Listen, if I was asking for a non-backwards-compatible change to the core
XML specs for a minority system, then sure, you could laugh at me. But I'm
not! All I'm working towards is an interoperable format that meets certain
specialist needs! It won't hurt anybody if it pans out, will it? Why are
you so hostile to binary XML? Why do you think that people who find XML
parsing a restrictive overhead must be denied this? What is so wrong with
ways of representing an XML document other than as a string of characters
with < and > in? I take it you're not against such representations as SAX
event streams and DOM trees, or do you think all XML processing software
should work directly with strings of characters?
Alaric B. Snell
http://www.alaric-snell.com/ http://RFC.net/ http://www.warhead.org.uk/
Any sufficiently advanced technology can be emulated in software