[
Lists Home |
Date Index |
Thread Index
]
At 11:54 PM -0500 3/10/03, winkowski@mitre.org wrote:
>On reflection, I don't think that the conclusions reached are all that
>surprising. Redundancy based compression achieves better results as the file
>size, and consequently the amount of redundancy, increases. CODECS that take
>advantage of schema knowledge achieve efficient localized encodings and also
>need not transmit metadata since this information can be derived at decoding
>time.
I may have missed something in your paper then, because I didn't
realize you were doing this. If you're assuming that the same schema
is available for both compression and decompression, then you're
doing a lossy compression. The conmpressed forms of your documents
have less information in them than the uncompressed forms. I don't
consider that to be a fair or useful comparison with raw XML with
metadata present.
Then again, maybe that's not what you meant? If you're somehow
embedding a schema in the document you transmit, then it's really
just another way of compressing losslessly and that's OK, though In
would still require that the schema used for compression be derived
from the instance documents rather than applied pre facto under the
assumption of document validity. Hmmm, that's not quite right. What I
really mean is that given a certain schema it must be possible to
losslessly encode both valid and invalid documents.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| Processing XML with Java (Addison-Wesley, 2002) |
| http://www.cafeconleche.org/books/xmljava |
| http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
|