Lists Home |
Date Index |
firstname.lastname@example.org (Oleg Dulin) writes:
>Thank you for your response.
>I've experimented with your Ripper a bit. It appears to handle what we
>need. I did notice a bug, though: it appears to stop parsing when it
>encounters a PI without any data. For instance:
><?foo ?> breaks Ripper, while <?foo bar?> is ok
Eek. I'll fix that - should be something simple in an if statement.
>Do you know of any other outstanding issues ?
If you count not processing the DTD as an outstanding issue, yes, though
that'll take more time to resolve.
>Ideally, what we need is a parser like RIpper that can capture the
>events into a tree-like structure. I looked at MOE but it appears a lot
>older than Ripper itself. Is there any active work being done on MOE ?
I'm working on integrating the two, but haven't had time. MOE's
foundations were built with a Ripper-like parser in mind, but of course
the details have shifted. I don't think that's active enough to solve
your problems at the moment, and I'm pretty much hoping to get to it in
>There appears to be another XML parsing technique that appears to
>preserve a lot more information than SAX -- it is XNI in Xerces. Of
>course, it is not nearly as complete as Ripper but it is more detailed
>than SAX and is actively used by Xerces. Have you ever evaluated XNI
>API for the purpose of "half-parsing" ? What is your opinion ?
I bounced off the XNI interface pretty hard - it seemed to be doing
things very differently from how I expected them to work, and I think it
simply has different priorities. That said, it's been over a year since
I looked at XNI, and Xerces is certainly widely used. Depending on the
level of detail you really need, XNI could well be a perfect fit for