[
Lists Home |
Date Index |
Thread Index
]
aslom@cs.indiana.edu:
> you could have some database abstraction layer to make this to happen
> automatic when you change any part of document (this layer could also
> transparently handle slicing XML input into fragments that are stored
> and modified independently to minimize need to re-indexing).
I don't think that would be possible in general. In particular, it
wouldn't be possible in my case, as the indexing was to speed up
page-based access to an Ebook; the addition of a word to a paragraph
at the start of the document could alter the page boundaries for the
entire document.
In general, I found processing XML in bounded space quite hard. I
only succeeded in bounding the space used in certain common cases, but
this could easily be thwarted unintentionally.
In my case, for example, I was trying to bound the space requirement
to that required by the elements displayed on the current page.
Problem is, given code like:
e_head(p: ref Parser, i: ref Item.Tag)
{
p.down();
# call functions to add layout to display page
# depending on the XML they encounter
while ((t0 := nexttag(p)) != nil) {
case t0.name {
"title" =>
e_title(p, t0);
"link" =>
e_link(p, t0);
"style" =>
e_style(p, t0);
}
}
p.up();
}
there is always the possibility that one of the functions we're
calling fills up the current page, in which case we must stop
processing immediately and return.
It's then difficult to go back to where we were before: we have marked
the XML parser's state, but we can't "freeze" the state of the e_head
function for future resuscitation, unless the implementing language
provides a feature that allows a process to externally store a
continuation of itself (mine doesn't).
This leaves only one option: implement the entire thing as a push-down
automaton... which loses all the "nice-looking-code" advantages we
just gained!
In the end, I indexed only elements that appeared at the top nesting
level, which did cover all the documents I found, but would have been
thwarted by someone putting <div>...</div> around the whole thing.
Maybe someone else on the list has encountered these kinds of problems
before and has a nice solution?
cheers,
rog.
|