[
Lists Home |
Date Index |
Thread Index
]
I'm reading this thread about packaging over month a late, but I'd like to
throw out this idea on packaging. While zip/jar/xar is certainly a proven
and well-understood technique it would be nice to have a text-based, human
readable/editable standard that was simple enough could be implemented as
extension to an xml processor with neglible impact to its code, memory, and
processing footprint.
Other than suggestions to use MIME packaging I'm not aware of any
discussions of a format that meets the above requirements, so it might be
useful to share these thoughts on what that could look like. The basic
idea is to wrap file-level content in a stripped down xml-like syntax that
has just one type of "element" which just contains CDATA (well, not even
CDATA, just bytes). Metadata is associated with the content via this
element's attributes, which may match the semantics of the equivalent mime
or http headers. Here's an example:
<?xpf http://www.somestandard.org/xpf/1/0 ?>
<file:boundary content-location="file1.xml" content-type='text/xml'
meta:another-attribute='foo'>
<?xml version="1.0" ?>
<?xml-stylesheet href="styles/style.css" type="text/css"?>
<doc>
sample doc
</doc>
</file:boundary>
<file content-location="styles/style.css" content-type='text/css'
content-length='35' meta:another-attribute='foo'>
BODY { BACKGROUND-IMAGE: url(image/background.bmp) }
</file>
<file content-location='image/background.bmp' content-encoding='gzip'>
[binary data, I wonder how this will show in email]
y OS4&Ng%bэ8Rg 4@OKxd
LY?a2wX"`A'rבi'|!=c$0rWɻς֮X\.
0wJ 3 w.қ)?~le)6lƌ ?IFu@W
The first line contains the header that identifies what kind of content
this bag of bits is. The "xpf" stands for extensible packaging format (any
better ideas for the name?), while the URI that follows specifies the
particular version of this format.
Next are the file elements. This example has three <file> elements that
correspond to three interrelated files, each element showing three
different ways to define the element's boundaries. If the element has an
(arbitrary) string appended to its name (e.g. file:boundary) the element
data will end when it encounter the end tag (in this case
"</file:boundary>"). If no boundary string is specified in the element
name, the end of the element data is found by skipping ahead by the number
of bytes specified by the content-length attribute. If neither is
specified it is assumed that the rest of this file is this element's data
(as shown by the last element).
The rest of the example should be fairly obvious -- metadata about each
file appears as the file element attributes and an appropriate set of mime
headers are spelled as attributes and retain their meanings as a specified
in the various RFCs. The files in the package can be references through
the use of the content-location mime header as specified in the MHTML RFC.
This specification would be orthogonal to any manifest or catalog (such
XPackage, etc.) -- they would be just be stored as another file in the
package. But there should be an attribute (e.g. xpf:manifest) to indicate
that the file can be treated as a manifest for this package.
This format can be easily streamed and provides random access as long the
content-length attribute is specified for each file. If the processor
supports gzip content-encoding it offers compression comparable to zip. One
limitation as this is currently described is the lack of index of the files
in the package. Having the manifest orthogonal makes validation of file
references and efficient random access difficult, so it would make sense to
define optional index elements that contain the minimum information about
the file (content-location and content-length).
Well, that's a brief description; I could go into more details about my
thoughts about its syntax, encoding issues, etc. but this email is long
enough. Actually this evolved from an idea for a standard header format
for embedding metadata that occurred to me while trying to piece together
data files from a scrambled hard drive -- if anyone's interested I could
describe that related idea.
-- adam
|