[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [xml-dev] Packaging (was Re: [xml-dev] Interoperability)
On Fri, 16 Nov 2001, Daniel Veillard wrote:
> The reason why ZIP was choosen over say zipped tar, is that at least
> once it's available locally you can seek in the ZIP to the entry you're
> interested into. Getting a package format which would be both streamable
> and seekable might be an interesting challenge. Being able to seek is obviously
> very important if you want to process large collections of packages
> but are interested only in some kind of data (like a text indexer).
Ok, there are a few options:
Stream readable but requires random access to write: create the
compressed archive file, then go back and insert an index at the front.
Readers read the index then either ignore the stream or seek onwards.
Stream read/write, but requires a bit of seeking to read: Use the opposite
of .tar.gz - .gz.tar, a tar file (which is seekable and streamable) with
each individual file gzipped.
...or a mix of the two; the difference is where the index information is
stored. All that's required to enable selective decompression is that the
files are individually compressed rather than the whole archive
compressed, so decompression can start at any file in the archive.
In order to permit streaming, all that's needed is that the metadata for a
file (meaning, in this case, all information required to decide whether it
needs to be extracted or not) must come before the file itself, so the
reader does not need to buffer file data until it reaches the file it
needs. The two options I give above are extreme ends of the solution space
for this - all the metadata in one chunk at the start, or metadata spread
through the archive, always just before the file it refers to.
Alaric B. Snell
http://www.alaric-snell.com/ http://RFC.net/ http://www.warhead.org.uk/
Any sufficiently advanced technology can be emulated in software