[
Lists Home |
Date Index |
Thread Index
]
Aleksander Slominski <aslom@cs.indiana.edu> wrote:
| Arjun Ray wrote:
| i am not sure how many functions are needed when processing XML?
| what comes to mind is tokenize XML, produce XML events and
| process them doing _something_ ...
What "XML events" are to be produced, though? ;-)
The granddaddy of all "parser event models" in this line of work is ESIS.
You can adopt it, elaborate it, or simplify it. That's taking the view of
"what can we get out of an XML document?". OTOH, for applications, the
view is "what do we want from an XML document (assuming it can be had)?"
That's where frameworks come in. The mistake is to try to make the parser
event model directly "useful" to applications. It really need not be so.
|> http://pobox.com/~oleg/ftp/papers/XML-parsing.ps.gz
|>
|> Passing "seeds" up and down a tree is similar to the patterns I'm trying
|> to develop.
|
| i remember this paper. it has a questionable comparison of expat
| that uses reading input char-by-char (instead of buffered stream)
Expat doesn't read input, so buffered stream is irrelevant. Expat gets
its input pushed to it (i.e. the app repeatedly calls expat with chunks of
input.) Oleg's modification was to pass Expat input chunks at a time of
one character each, to simulate a similar input system in SSAX. If you
say that's all really artificial, I agree (the real question would be why
SSAX can't accept larger input chunks!), but it was pretty clear that he
was trying to avoid a nonsensical benchmark. All he got was an irrelevant
one. ;-)
http://okmij.org/ftp/Scheme/SSAX-benchmark-1.html
| one thing i did not get: isn't "seed" global variable that is shared by
| all handlers in SSAX:make-parser/foldts?
No. Scheme and Haskell are lexically scoped, and a global would be silly
anyway. If you're thinking of the example that has
(let ((result
((SSAX:make-parser
NEW-LEVEL-SEED
(lambda (elem-gi attributes namespaces expected-content seed)
seed)
...
That was only an example. The paper has two more examples: the lambda
expression is supposed to be provided by the particular application.
| also how handling of dispatching descisions is done, for example if
| <table> may contain both <th> and <tr> in any order ...
That's what the seed functions are all about.
[Note, btw: I'm not *endorsing* SSAX, I'm just saying that it has some
interesting ideas behind it. The thread *is* about parser models, right?]
| so i think i will need to wait and see an example where Element/Content
| framework works to see its full potential ...
Fairing out the project hasn't reached top-of-stack status yet. ;-)
|