OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: standard compressed XML format?

[ Lists Home | Date Index | Thread Index ]
  • From: "Schneider,John C." <jcs@mitre.org>
  • To: "'Simon St.Laurent'" <simonstl@simonstl.com>, "'XML-Dev Mailing list'" <xml-dev@xml.org>
  • Date: Thu, 16 Mar 2000 19:24:36 -0500

Hi Simon,

	I share your concerns about XML's perceived verbosity.  There are many
communities (web, messaging, software components, databases) moving to XML
for sharing data for the obvious benefits, but many of them are giving up
some level of efficiency due to XML's size.  We're currently leading an
large, international community from a proprietary data sharing format toward
XML.  The original format was developed in the 70s when terseness and
bandwidth conservation were primary concerns.  Needless to say, XML
represents more than a 10 fold increase in size, making verbosity and
bandwidth issues the most commonly cited obstacles to the move.

	Prior to XML (80s and early 90s), we devised and prototyped several
technologies to support the deployment of the aforementioned proprietary
format.  Many of the technologies we developed have existing or emerging
parallels in the XML community (e.g., an object model, validating parsers, a
schema language, query languages, validating editors, etc.).  [If you're
interested, the position paper I submitted to the first W3C Query workshop
has slightly more detail about our work
(http://www.w3.org/TandS/QL/QL98/pp/mitre.html).]

	One of the concepts we devised fits the description you give below and,
with sufficient tweaking, could form the basis of an efficient XML encoding
scheme.  The algorithm does not rely on character redundancy and, as such,
works equally well for small information objects that tend to get larger
using algorithms like zip.  In addition, it's design permits it to be
read/written directly from an appropriately modified DOM implementation
instead of incurring the cost of a separate compression/decompression step.

	If widely implemented (e.g., as part of DOM implementations), a more
efficient XML encoding has the potential to enable faster, more efficient
sites, services, components, messaging, etc.  In addition, it could enable
more seamless access to information services for the ever increasing world
of wireless, bandwidth constrained devices.  On a more selfish note, it
would also give me an answer for the multitudes I've heard complaining about
XML's size.  ;u)

	I work for an independent not-for-profit government lab and am very
interested in placing the encoding concept we devised into the public domain
as one input to an XML standard.  This would accelerate the development of a
capability my customer's need and result in a less costly, higher quality
product than if they contracted someone to develop it as a proprietary
capability.  As an open standard, it would also increase the likelihood of
it integrating well with the other tools they already use.  If a W3C WG
existed to address this issue with the relevant IPR disclaimers in place,
I'd love to share the concept.  I'm hesitant to share it in this forum for
fear that a vendor would attempt to gain proprietary control of it.  [IMHO,
the concepts are actually quite simple and I'm surprised I haven't seen some
incarnation of them yet].

	This has become somewhat of a pet topic of mine over the last 2 years, but
I've been waiting for the right time to put energy into it.  It didn't
really seem to make since until the community felt it important enough to
warrant a working group.  My [sometimes cloudy] crystal ball indicates that
time is coming sooner than later.  As more communities start to see XML
working, I expect to see a natural shift in emphasis from functionality to
efficiency.  The fact that the wireless community is charging ahead is also
a motivating factor.  With these thoughts in mind, I finished the first
draft of an IR&D proposal last week proposing we begin work on it soon.
(Your e-mail was timely).  If and when we see an activity stood up to
address this area, I'd love to pitch in.

	Take care!,

	John Schneider

	Principal Systems Engineer
	The MITRE Corporation
	D540 - Information Interoperability





-----Original Message-----
From: owner-xml-dev@xml.org [mailto:owner-xml-dev@xml.org]On Behalf Of
Simon St.Laurent
Sent: Thursday, March 16, 2000 12:34 PM
To: XML-Dev Mailing list
Subject: standard compressed XML format?


Is anyone doing any work on a standard compression format for XML documents?

I'm starting to get concerned about the volume of complaints I'm getting
from readers and folks in Web development forums who are starting to argue
that XML's verbosity is a problem, especially for things like transmitting
vector graphics information.  There are a lot of wasted bits in XML
documents - and of course in HTML and other text documents as well.

I'm not happy about the prospect of sending documents to browsers as .zip
or some other compressed format and making users go through multiple steps
to decompress and view the content.  I'd like to think that we could come
up with a compression/decompression algorithm for markup (maybe just XML,
maybe all text) that we can use transparently.  Ideally, it would be an
algorithm explicitly placed in the public domain, avoiding licensing and
legal battles.

Some folks have argued that this belongs in transfer protocols, while
others have argued that it should be a 3rd party function, like .zip and
.sit are today.  I'm not convinced by the first because so many competing
formats (gif, jpeg, flash, etc.) already include compression, and I'm not
convinced by the second because I don't think users are willing to
micromanage such a process.

It also has an impact on some of the discussions on the IETF-XML-MIME
discussion (see http://www.imc.org/ietf-xml-mime/ for archives and
information) because we're already discussing how best to mark information
as XML for possible generic processing.  If a compression standard emerged,
it might well have an impact on MIME types - and I'd like to see that
discussion start before we settle the MIME types for XML debate.

Any thoughts?  I like the fact that XML is verbose when I'm editing and
processing, but it's not so good in transmission.  I'd like to think that
there's a good _general_ solution that will let us have the best of both
worlds.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************


***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS