OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   binary xml for large collections of structured data?

[ Lists Home | Date Index | Thread Index ]

(rambling some, oh well).

how about xml for xml-like data, but a lot of it?...
eg: GB's of xml, representing things like arbitrary types of serialized 
objects, collections, ...

thoughts come to mind, eg:
programming language object stores;
fairly detailed 3d worlds and resource sets;
possibly compound video streams (I am imagining lots of small video streams 
tied together into a bigger one using xml-like data);

so, there could be 2 varieties:
a flat serialized version, which could be simpler and used for read or write 
only access;
a dynamic random-access version, which could be more complex, but would 
allow read/write access, and possibly tranactional stuff (the log likely 
being kept as a seperate file). something like a b-tree could make sense.

similar could be a cell-based heap-like system (dunno a better term). this 
could gain simplicity at the cost of size and flexibility. eg:
the data files are divided into a number of cells, eg, each, say, 16 bytes. 
the cells are managed with a bitmap, and allocated as objects consisting of 
one or more cells. these are divided into chunks, each with a certain number 
of cells and it's own bitmap (eg: some power of 2 size in MB).
this approach is fairly similar to how my current memory manager/garbage 
collector works, so it might make sense. my mm/gc used 1MB chunks, but this 
might be small. I guess I could allow bigger chunks as well (like my mm).

(ok, I will have to estimate the amount of inflation, but I think it might 
be signifigant...). presumably, as much data would need to be packed into 
single runs as is reasonable.

there could be possibly other file types (traditionally, file types like 
this are distinguished with file magic's, which would make sense here).

I once did store images with a similar format (actually, just a swizzled 
dump of the heap). I remember I had reserved the first chunk for stuff like 
a file header, and crap like names for symbols and built-in functions.
now, I am more conventionally serializing data, but then I have to write 
code to serialize/deserialize all the types...

now, the serial variety could likely be further divided: fine and coarse.
fine would emphasize packing density, at the cost of performance (imo, this 
is about what I came up with before).
coarse would emphasize access performance, and would likely waste extra 
space (eg: fixed-sized numbers and string-table only tags).

this should roughly cover this domain (though it could be argued that this 
is too far outside xml's domain anyways...). but then again: what domain am 
I in really?...

I would think the riff domain, as riff is typically used here, but riff 
falls short and is often a bad-fit for my use patterns.

I may spec some.
I guess I have already come up with the serial-fine variety.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS