OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Request: Techniques for reducing the size of XML instances

>  That's my fundamental question about binary XML, and IMO the answer
>almost never.  Obviously you disagree, but I think the whole idea is
>solving yesterday's problems, instead of focusing in tomorrow's.

Discussions of binary XML often conflate the different reasons that
people might desire a binary representation of XML.  In my mind there
are two major differing motivations:

1) Compress wire-level traffic.  This is normally important in
situations where systems are communicating through a low-bandwidth or
high latency link like satellite.  It is also a common issue brought up
by people on fast networks who believe that they will nevertheless be
hurt by XML's verbosity when shipping data around.

2) Have a small in-memory footprint.  This is normally brought up in
context of the burgeoning explosion of "smart" devices.  For example, if
I want to use my Casio watch to send a SOAP message to my refrigerator
(everyone knows that refrigerators don't have much RAM), I want to
conserve memory while parsing the message.

I think it is important to consider these as different problem spaces.  

1) It is probably important for a solution to #2 to be able to be parsed
directly from the storage rather than require expensive (in terms of RAM
and CPU) decompression.  This implies some sort of "tokenization", while
scenario #1 can assume a bigger CPU and RAM cost on the endpoints
(trading off CPU and RAM for bandwidth).

2) Option #2 doesn't really need to have a standard defined, since it is
not about interop.  People can tokenize however they want and it has
little impact on the rest of the world.  On the other hand, scenario #2
kind of implies that you want a standard so you can interop.  In fact,
one of the main reasons for using XML as a wire format is to get interop
-- if you tokenize in some non-standard format, you might as well be
using DCOM or RMI -- you just blew one of the main advantages of XML
wire transfer.  Furthermore, gzip is almost a standard (used optionally
for HTML markup in HTTP 1.1), so a wire compression scheme for XML would
have an uphill battle against gzip.  Of course people may still wish to
compress wire-level XML for "absolute maximum performance" situations,
but I am guessing it will usually not be in cases where interop is
paramount, and therefore may hint at fewer incentives for people to
adhere to any standards that might be created for XML wire-level