[
Lists Home |
Date Index |
Thread Index
]
Hello Dan,
let me add something to the responses your question has created : )
Dan White wrote:
><foo> <bar> woof </bar> </foo> <foo> <bar> woof </bar> </foo>
>
>with "foo" being the root tag
>
>
...
>>Yes, preprocessing is a possibility, but how would one do it ?
>>
>>How do you locate where one statement ends and the next begins ?
>>
>>
>>
Underneath the craziness, XML is simple to parse. Some theoreticians
have called it a parentheses language. This insight may actually be
helpful for your problem
given a way to identify start and end tags and cdata sections(check the
grammar) the following algo will do:
1 initialize a counter c to 0
2 traverse the input, till the first start tag.
3 c++
4 while c > 0
4.1 traverse input until next '<' (this is always a tag, it is not
allowed in attributes or anywhere else)
4.2 if starttag, then c++
else if endtag c--
else if cdata skip to next "]]>" // cdata section
else skip to next '>' // processing instruction or comment
5 cut here.
It gets simpler and faster if you can assume that there are no
processing instructions, comments, cdata etc.
This only works for well formed XML fragments. No doubt, if you had your
own parser, you could tell it to just read the first element - there are
also some libraries that support this directly, but I don't know of any
in C or C++.
Hope this helps.
cheers,
Burak
|