[
Lists Home |
Date Index |
Thread Index
]
Arjun Ray writes:
> A push API shouldn't be too difficult. By in-memory do you mean some
> analogue of DOM, where all the tokens are held in a structure of some
> sort (like a parse tree)?
Right now I'm thinking of something more like a list of tokens from
start to finish than a parse tree. For example, given the document:
<hello test="zip">tart<zork /></hello>
I might want to have a list of nodes like:
elementStart:hello
attributeStart:test
text:zip
attributeEnd:test
text:tart
elementStart:zork
elementEnd:zork
elementEnd:hello
with a list like that (modulo some issues on whether I want to represent
starts and ends of tags), I can do things like search for all text nodes
containing "tart" quite easily and then build a tree out of the list
components if I feel it appropriate. There's no need for tree-walking
or the many issues that it creates, though there may well be a need to
combine adjacent nodes according to a relatively simple set of rules.
SAX events, because they are reported as sets of strings (or sets of
strings with attribute structures attached) or characters, aren't easily
stored in such alternative structure. They're deliberately fleeting
creatures, passing by rapidly with no easy means of storage - except
insofar as we do things like convert their information into DOMs or
other objects.
(MOE is one effort to create tangible events that can be kept around for
longer, possibly but not necessarily as trees, and which can be broken
down into somewhat finer granules than SAX provides, and I'll see what I
can do to support these kinds of options.)
I'd like to be able to play with those events using other styles of
processing. The list approach above looks promising for some kinds of
problems, especially for querying on content and it's conveniently
tolerant of things like well-formedness failures. Moving flexibly from
document to events to tree or list to tree or list to events again to
document again sounds interesting.
-------------
Simon St.Laurent - SSL is my TLA
http://simonstl.com may be my URI
http://monasticxml.org may be my ascetic URI
urn:oid:1.3.6.1.4.1.6320 is another possibility altogether
|