[
Lists Home |
Date Index |
Thread Index
]
At 10:05 AM -0500 3/13/03, msc@mitre.org wrote:
>I was thinking about this as well.
>
>If prefered, you could include some representation of the schema within the
>compressed format. This would provide all the informatin needed for
>producing the orginal instance, assuming you know the proper algorithm.
>This would make things more like the tradtional gzip scenario, where the
>replacement dictionary is included in the compressed file, and it's assumed
>that you know the decoding algorithm.
There are some XML compression schemes that do this. I consider those
non-lossy.
>The difference is that the schema based approach uses one set of information
>to encode/decode an entire class of instance documents. Whereas in
>something like gzip, you need a custom-made replacement dictionary for each
>file you compress. So in a schema based approach, you realy only need to
>deploy the schema once, rather than every time you send the compressed
>information.
This I don't believe. The fundamental flaw here is
1. Documents of interest have schemas
2. Documents of interest reliably satisfy their advertised schemas.
These flawed premises underlie a huge amount of XML theory. Neither
of them matches XML practice. Any satisfactory compression scheme
must be prepared to handle invalid documents losslessly.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| Processing XML with Java (Addison-Wesley, 2002) |
| http://www.cafeconleche.org/books/xmljava |
| http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
|