xml-dev - Re: [xml-dev] XML too hard for programmers?

Re: [xml-dev] XML too hard for programmers?

[ Lists Home | Date Index | Thread Index ]

To: rog@vitanuova.com
Subject: Re: [xml-dev] XML too hard for programmers?
From: Aleksander Slominski <aslom@cs.indiana.edu>
Date: Fri, 21 Mar 2003 21:50:34 -0500
Cc: xml-dev@lists.xml.org
In-reply-to: <e88ca411b427c230b1f3c3bd06452f38@vitanuova.com>
References: <e88ca411b427c230b1f3c3bd06452f38@vitanuova.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2; MultiZilla v1.3.1 (a)) Gecko/20030210

rog@vitanuova.com wrote:

>Hi,
>I'm afraid I'm new to this list, so am probably breaking 300 list
>taboos...
>
hi,

i am sure i am breaking some too so you are not alone :-)

>It's just a simple and small solution to an XML parsing problem I had,
>which was satisfactory at the time, seems to me to be generally
>applicable, and I haven't seen anything similar. Sorry in advance
>about the length.
>
i think so too and is on the reason i worked on XmlPull API 
(www.xmlpull.org)

>First thing: this is no panacea, and to be quite honest, I'm actually
>(heretic!)  not at all keen on XML, but this API at least made things
>bearable; the structure of the code dealing with the XML was quite
>logical, and the space used in doing so was largely bounded.
>
that is true about pull parsing: in general: the code that is doing 
parsing tends to mirror
XML structure and i even went so far as to say that it is pattern common 
in xml pull parsing
(http://www.extreme.indiana.edu/~aslom/xmlpull/patterns.html#MIRROR)

>I developed it when I was writing a browser for the Open Ebook
>standard to run on limited memory platforms.  Obviously DOM was out of
>the question, and SAX became really awkward as it would have been
>necessary to traverse the whole XML tree from the start when moving
>back a page; moreover I found it difficult to write code that
>corresponded directly to the DTDs.
>
>The idea is very simple: treat the XML as a multi-level stream, and
>provide an interface that allows one to *mark* a place in the stream,
>and *go to* a previously marked place.
>
that goes one step above and beyond streaming pull parsing however i was 
already experimenting with something like that in XPP2 XmlPullNode that 
allowed to build XML tree in memory on demand and even for sub-trees to 
access directly XPP2 event stream and i was very happy with capabilities 
of such "mixed" API.

>The basic API looked something like:
>
<snip/>

>Open()ing an XML file produces a parser p; then p.next() produces the
>next XML item in the file *at the same nesting level*.
>
that is main different when comparing  witrh XmlPull as next() in 
XmlPull returns move stream to next event doing depth-first iteration 
(exactly like SAX).

>Therefore,
>
>	p := xml->open("foo.xml");
>	while ((i := p.next()) != nil)
>		process_item(i);
>
>
>will only process the top level elements.
>
this will process top level elements in XmlPull:

while( parser.nextTag() == pp.START_TAG ) {
  processItem(parser);
}


>A crucial point is that when you get to the end of the current nesting
>level, next() returns nil; this allows one to easily write a
>recursive-descent-style parser, for instance (from the ebook reader)
>parses a <head> tag:
>
nextTag in XmlPull allows you to do the same as it returns only two 
vallues START_TAG or END_TAG and exception is thrown if input contained 
anything else

>	e_head(p: ref Parser, i: ref Item.Tag)
>	{
>		p.down();
>		while ((t0 := nexttag(p)) != nil) {
>			case t0.name {
>			"title" =>
>				e_title(p, t0);
>			"link" =>
>				e_link(p, t0);
>			"style" =>
>				e_style(p, t0);
>			}
>		}
>		p.up();
>	}
>
and here is how it could be done in XmlPull (for details see: 
http://www.extreme.indiana.edu/~aslom/xmlpull/patterns.html#ANY_ORDER)

	e_head(XmlPullParser parser) throws XmlPullException
	{
		parser.require( pp.START_TAG, null, "item");
		while (parser.nextTag(parser) != XmlPullParser.START_TAG) {
			if( "title".equals(parser.getName()) {
				e_title(parser);
			} else if( "title".equals(parser.getName()) {
				e_link(parser);
			} else if( "style".equals(parser.getName()) {
				e_style(parser);
			} else { // ignore uknonw elements 
			 wrapper.skipSubTree();
		}
		parser.require( pp.END_TAG, null, "item");
	}


>Here, nexttag() is a locally defined function that returns the next
>Item that's a Tag, ignoring everything else.  The various e_*
>functions deal with the various kinds of tags that can be found within
>an XHTML <head> tag.
>
in XmlPull nextTag() is more restrictive and will skip only white space 
text content.

>This style of interface means that it's possible to write code that
>matches fairly closely the DTD, does not parse the whole document into
>one in-core data sstructure, and avoids having to write abstruse state
>machines!
>
i agree completely :-)

>If the XML is in a seekable file, you can mark a place in the file
>(which records all the state of the XML parser at that point in the
>file, and a place to seek to), and then return to it later, or even
>store the mark externally and use it as an index for rapid
>retrieval at a later date.
>
>This means that for files containing a large dataset (e.g.  Ebooks!)
>you don't necessarily have to store all the dataset, even in a derived
>data structure.
>  
>
if you want minimal memory overhead (and not just create DOM and 
navigate it)
you can record XML context of one position in file (that would include 
i-scope namespace
declarations, stack of start tags, attributes etc.) and use it to move 
back parser and
then restart parsiing from this position though  i have not seen parser 
that can do this ...

>I'm aware that my parsing of XML is probably hopelessly naive, and
>perhaps there is some facet of XML that makes this approach impossible
>for XML in general (I came up with this for a specific problem, after
>all).  If so, I'd love to know why.
>
thi sapproach means that parsing is done again and again each time you 
move back in stream (and this can only work with stream that supports 
efficient marking and going back - this works fine for files but is not 
case for networks sockets ...)

>If not, I hope I've managed to contribute a thought or two to the
>debate...
>
it was very interesting post and showing that we have similar problems 
and come up with
similar approaches ot solve them.

thanks,

alek

References:
- RE: [xml-dev] XML too hard for programmers?
  - From: rog@vitanuova.com

Prev by Date: Re: [xml-dev] Word 11 element formatting
Next by Date: Re: [xml-dev] An approach to let XML 2.n resources hold multiple entities
Previous by thread: RE: [xml-dev] XML too hard for programmers?
Next by thread: RE: [xml-dev] XML too hard for programmers?
Index(es):
- Date
- Thread