OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Packaging (was Re: [xml-dev] Interoperability)

[ Lists Home | Date Index | Thread Index ]

I'm reading this thread about packaging over month a late, but I'd like to 
throw out this idea on packaging.  While zip/jar/xar is certainly a proven 
and well-understood technique it would be nice to have a text-based, human 
readable/editable standard that was simple enough could be implemented as 
extension to an xml processor with neglible impact to its code, memory, and 
processing footprint.

Other than suggestions to use MIME packaging I'm not aware of any 
discussions of a format that meets the above requirements, so it might be 
useful to share these thoughts on what that could look like.  The basic 
idea is to wrap file-level content in a stripped down xml-like syntax that 
has just one type of "element" which just contains CDATA (well, not even 
CDATA, just bytes).  Metadata is associated with the content via this 
element's attributes, which may match the semantics of the equivalent mime 
or http headers. Here's an example:

<?xpf http://www.somestandard.org/xpf/1/0 ?>
<file:boundary content-location="file1.xml" content-type='text/xml' 
     <?xml version="1.0" ?>
     <?xml-stylesheet href="styles/style.css" type="text/css"?>
     sample doc
<file content-location="styles/style.css" content-type='text/css' 
content-length='35' meta:another-attribute='foo'>
BODY { BACKGROUND-IMAGE: url(image/background.bmp) }
<file content-location='image/background.bmp' content-encoding='gzip'>
[binary data, I wonder how this will show in email]
 y OS4&Ng%bэ8Rg 4@OKxd 
0wJ 3	w.қ)?~le)6lƌ	?IFu@W

The first line contains the header that identifies what kind of content 
this bag of bits is. The "xpf" stands for extensible packaging format (any 
better ideas for the name?), while the URI that follows specifies the 
particular version of this format.

Next are the file elements. This example has three <file> elements that 
correspond to three interrelated files, each element showing three 
different ways to define the element's boundaries.  If the element has an 
(arbitrary) string appended to its name (e.g. file:boundary) the element 
data will end when it encounter the end tag (in this case 
"</file:boundary>").  If no boundary string is specified in the element 
name, the end of the element data is found by skipping ahead by the number 
of bytes specified by the content-length attribute.  If neither is 
specified it is assumed that the rest of this file is this element's data 
(as shown by the last element).

The rest of the example should be fairly obvious -- metadata about each 
file appears as the file element attributes and an appropriate set of mime 
headers are spelled as attributes and retain their meanings as a specified 
in the various RFCs.  The files in the package can be references through 
the use of the content-location mime header as specified in the MHTML RFC.
This specification would be orthogonal to any manifest or catalog (such 
XPackage, etc.) -- they would be just be stored as another file in the 
package. But there should be an attribute (e.g. xpf:manifest) to indicate 
that the file can be treated as a manifest for this package.

This format can be easily streamed and provides random access as long the 
content-length attribute is specified for each file. If the processor 
supports gzip content-encoding it offers compression comparable to zip. One 
limitation as this is currently described is the lack of index of the files 
in the package.  Having the manifest orthogonal makes validation of file 
references and efficient random access difficult, so it would make sense to 
define optional index elements that contain the minimum information about 
the file (content-location and content-length).

Well, that's a brief description; I could go into more details about my 
thoughts about its syntax, encoding issues, etc. but this email is long 
enough.  Actually this evolved from an idea for a standard header format 
for embedding metadata that occurred to me while trying to piece together 
data files from a scrambled hard drive -- if anyone's interested I could 
describe that related idea.

-- adam


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS