OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   simple answer Re: [xml-dev] Handling/Parsing/Validating multipleXML Stat

[ Lists Home | Date Index | Thread Index ]

Hello Dan,

let me add something to the responses your question has created : )
Dan White wrote:

><foo> <bar> woof </bar> </foo> <foo> <bar> woof </bar> </foo>
>with "foo" being the root tag

>>Yes, preprocessing is a possibility, but how would one do it ?
>>How do you locate where one statement ends and the next begins ?

Underneath the craziness, XML is simple to parse.  Some theoreticians 
have called it a parentheses language. This insight may actually be 
helpful for your problem

given a way to identify start and end tags and cdata sections(check the 
grammar) the following algo will do:

1 initialize a counter c to 0
2 traverse the input, till the first start tag.
3 c++
4 while c > 0
4.1 traverse input until next '<' (this is always a tag, it is not 
allowed in attributes or anywhere else)
4.2  if starttag, then c++
  else if endtag c--
  else if cdata skip to next "]]>" // cdata section
  else skip to next '>' // processing instruction or comment
5  cut here.

It gets simpler and faster if you can assume that there are no 
processing instructions, comments, cdata etc.

This only works for well formed XML fragments. No doubt, if you had your 
own parser, you could tell it to just read the first element - there are 
also some libraries that support this directly, but I don't know of any 
in C or C++.

Hope this helps.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS