OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] XML Binary and Compression

[ Lists Home | Date Index | Thread Index ]

Yes it is true that for the schema-based encoder/decoder insignificant
whitespace is lost. This behavior is similar to XSLT processor canonicalize
of whitespace.

Also the lexical format of content may not be preserved when using a
schema-based encoding approach (i.e. 100" and "1.0E2"). 

I'm not suggesting that the schema-based approach is one-size fits all. In
fact, as shown in our experiments, it is suboptimal for large datasets. But
it has value, especially for smallish schema-valid files and could be a part
of a broader solution to XML size optimization. By its vary nature no
optimization  solution will be optimal for everyone's requirements. But a
combination of solutions may meet 80% of the cases.

- Dan

> -----Original Message-----
> From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu]
> Sent: Thursday, March 13, 2003 10:39 AM
> To: winkowski@mitre.org; xml-dev@lists.xml.org
> Cc: winkowski@mitre.org; msc@mitre.org
> Subject: RE: [xml-dev] XML Binary and Compression
> At 9:18 AM -0500 3/13/03, winkowski@mitre.org wrote:
> >Hmm, I'm sorry you don't think schema-based encoding is 
> fair. I find it odd
> >that you regard schema-based (encoding) compression as 
> lossy. This term is
> >normally associated with a permanent loss of information. 
> Neither ASN.1 or
> >MPEG-7 result in the loss of XML content (the original 
> content did not of
> >course contain the XML schema). The deployment of the schema 
> upon which
> >encoding/decoding is based in a management issue. There is no need to
> >transmit it as part of the encoded content.
> >
> I suppose it depends on the schema based encoding. The ones I've seen 
> do things like throw away white space they don't consider to be 
> significant based on data type. That's lossy. They also normally 
> require the same schema to be present on the receiving end for 
> decompression. I couldn't tell from skimming your paper whether that 
> happened in you data or not. At first I thought it didn't, but what 
> you posted here later indicated that maybe it did. Can you clarify?
> -- 
> +-----------------------+------------------------+-------------------+
> | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
> +-----------------------+------------------------+-------------------+
> |           Processing XML with Java (Addison-Wesley, 2002)          |
> |              http://www.cafeconleche.org/books/xmljava             |
> | http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA  |
> +----------------------------------+---------------------------------+
> |  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
> |  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
> +----------------------------------+---------------------------------+


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS