Lists Home |
Date Index |
Bob Foster wrote:
> Recent criticisms of some Eclipse-based XML editors (including mine) (in
> part) because they use a lot of memory relative to file size underline
> the fairly obvious fact that XML files are often much larger than
> programming language files. When the techniques used successfully for
> programming languages are applied to XML, they can break down.
> The first person I ever saw address this issue directly was Bryan Ford,
> in his packrat parsing paper
> (http://www.brynosaurus.com/pub/lang/packrat-icfp02.pdf). Packrat
> parsing requires an O(n), where n is the document size, data structure
> with a rather large constant factor. Ford observes "For example, for
> parsing XML streams, which have a fairly simple structure but often
> encode large amounts of relatively flat, machine-generated data, the
> power and flexibility of packrat parsing is not needed and its storage
> cost would not be justified."
This has reminded me of the Judy array, which can function as a list or
a dictionary, and is supposed to have very good performance and low
memory load whether sparsely or densely populated. This is achieved by
havin a lrage nuber of specialized substructure types, put into play
depending on the local data characteristics. Quoting from an
introduction to Judy arrays -
"Judy adapts efficiently to a wide range of populations and data set
densities. Since the Judy data structure is a tree of trees, each
sub-tree is a static expanse that is optimized to match the "character"
or density of the keys it contains. To support this flexibility, in
32-bit Judy there are approximately 25 major data structures and
a similar number of minor structures."
Perhaps a really top XML editor needs a similarly large number of
specialized tree structures (glad it's not me writing them!).