[
Lists Home |
Date Index |
Thread Index
]
- To: rog@vitanuova.com
- Subject: Re: [xml-dev] XML too hard for programmers?
- From: Aleksander Slominski <aslom@cs.indiana.edu>
- Date: Mon, 24 Mar 2003 16:51:36 -0500
- Cc: xml-dev@lists.xml.org
- In-reply-to: <907083ff912ee41c723462180ef360ad@vitanuova.com>
- References: <907083ff912ee41c723462180ef360ad@vitanuova.com>
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2) Gecko/20030210
rog@vitanuova.com wrote:
>>if you want minimal memory overhead (and not just create DOM and
>>navigate it) you can record XML context of one position in file (that
>>would include i-scope namespace declarations, stack of start tags,
>>attributes etc.) and use it to move back parser and then restart
>>parsiing from this position though i have not seen parser that can do
>>this ...
>>
>>
>
>My parser does that. For example, when I parse an ebook, I lay it out
>a page at a time, and mark the position in the XML of the content that
>starts each page. I then write an index file containing all the marks
>for each page.
>
hi,
that is hwat i was suspecting :-)
>Once a document has been indexed, it's very quick to, say, open the
>document and jump to the 200th page, or to jump back quickly page by
>page, without storing all the XML for each page.
>
>The drawbacks are that: a) if the document changes, you have to
>reindex everything and b) if any of the display attributes (e.g. text
>size, line spacing, etc) changes, you have to reindex everything.
>
you are getting close to creating an untra lightweight XML database -
you index interesting informationand need to retrieve it later so you
could have some database abstraction layer to make this to happen
automatic when you change any part of document (this layer could also
transparently handle slicing XML input into fragments that are stored
and modified independently to minimize need to re-indexing).
>All I record for a mark is the offset in the file, the read depth and
>the tags of each level of nesting. I don't know anything about
>i-scope namespace declarations (I said I was hopelessly naive!)
>
it was supposed to be "in-scope" namespace delcarations - if you do not
use namespaces then you do not need to record this.
>>and here is how it could be done in XmlPull (for details see:
>>http://www.extreme.indiana.edu/~aslom/xmlpull/patterns.html#ANY_ORDER)
>>
>>
>[...]
>
>
>> wrapper.skipSubTree();
>>
>>
>
>I think the advantage of having the nesting level explicit in the
>parsing is that the parser is in a position to deal reasonably
>robustly with malformed XML, without aborting.
>
>I started off aborting with an error on any mismatched tag, but I
>found that in practise, files I was finding on the net had a plethora
>of minor errors, and fixing them is much easier if the parser gives
>warnings for many errors in the same document (sometimes there are
>hundreds of errors) rather than aborting at the first one...
>
>Of course, skipSubTree could do something like that, but it has not
>got the option of ascending further up in the tree than the level at
>which it was called, which is sometimes the best thing to do
>(depending on your recovery heuristics, obviously).
>
in this particular example skipSubTree() will skip current sub tree but easily enough you canmove to one level up (and that could be wrapped into function moveOneLevelUp()):
while(parser.nextTag() == START_TAG) {
skipSubTree()
}
parser.require(END_TAG, ...)
and after that parser is positioned one level up now.
alek
--
"Mr. Pauli, we in the audience are all agreed that your theory is crazy.
What divides us is whether it is crazy enough to be true." Niels H. D. Bohr
|