[
Lists Home |
Date Index |
Thread Index
]
(rambling some, oh well).
how about xml for xml-like data, but a lot of it?...
eg: GB's of xml, representing things like arbitrary types of serialized
objects, collections, ...
thoughts come to mind, eg:
programming language object stores;
databases;
fairly detailed 3d worlds and resource sets;
possibly compound video streams (I am imagining lots of small video streams
tied together into a bigger one using xml-like data);
..
so, there could be 2 varieties:
a flat serialized version, which could be simpler and used for read or write
only access;
a dynamic random-access version, which could be more complex, but would
allow read/write access, and possibly tranactional stuff (the log likely
being kept as a seperate file). something like a b-tree could make sense.
similar could be a cell-based heap-like system (dunno a better term). this
could gain simplicity at the cost of size and flexibility. eg:
the data files are divided into a number of cells, eg, each, say, 16 bytes.
the cells are managed with a bitmap, and allocated as objects consisting of
one or more cells. these are divided into chunks, each with a certain number
of cells and it's own bitmap (eg: some power of 2 size in MB).
this approach is fairly similar to how my current memory manager/garbage
collector works, so it might make sense. my mm/gc used 1MB chunks, but this
might be small. I guess I could allow bigger chunks as well (like my mm).
(ok, I will have to estimate the amount of inflation, but I think it might
be signifigant...). presumably, as much data would need to be packed into
single runs as is reasonable.
there could be possibly other file types (traditionally, file types like
this are distinguished with file magic's, which would make sense here).
I once did store images with a similar format (actually, just a swizzled
dump of the heap). I remember I had reserved the first chunk for stuff like
a file header, and crap like names for symbols and built-in functions.
now, I am more conventionally serializing data, but then I have to write
code to serialize/deserialize all the types...
now, the serial variety could likely be further divided: fine and coarse.
fine would emphasize packing density, at the cost of performance (imo, this
is about what I came up with before).
coarse would emphasize access performance, and would likely waste extra
space (eg: fixed-sized numbers and string-table only tags).
this should roughly cover this domain (though it could be argued that this
is too far outside xml's domain anyways...). but then again: what domain am
I in really?...
I would think the riff domain, as riff is typically used here, but riff
falls short and is often a bad-fit for my use patterns.
I may spec some.
I guess I have already come up with the serial-fine variety.
|