[
Lists Home |
Date Index |
Thread Index
]
5/17/02 3:28:33 AM, Sean McGrath <sean.mcgrath@propylon.com> wrote:
>There is more to it than a buffer. Parsers can and do emit chunks of content
>at boundaries that suit themselves. So
>
><foo>
>Hello world
></foo>
>
>is not guaranteed to produce 1 data event that can be slurped into a buffer in
>one go. More generally, in the presence of mixed content there will definitely
>be multiple chunks. So you end up with this pattern:
>
>start_foo:
> buffer = ""
> inFoo = 1
>
>end_foo:
> print buffer
>
>characters (chunk):
> if inFoo:
> buffer.append (chunk)
>
>This rapidly gets out of hand.
>
>Rightly, the need for this pattern drives the data-heads nuts. It would be
>soo nice to
>know that in the presence of data-oriented XML, the fundamental parser
>layer would
>emit complete PCDATA chunks.
>
>Trouble is, there is no consensus on what data-oriented XML is and how
>it could be flagged to a processor. Consequently, data-oriented APIs that
>avoid that above unside-down and state-space-laden constructs
>such as RAX (http://www.xml.com/pub/a/2000/04/26/rax) cannot go
>anywhere.
>
>An XML Features Manifest would be one way to flag it
>(http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-Dec-1999/0002.html)
>but that never went anywhere either:-)
An even simpler alternative is a SAX filter that does nothing but condense consecutive PCDATA
events. Such a thing exists, for example, in the Perl world as XML::Filter::BufferText. That way,
you don't have to flag anything to the processor, you just read its output through a pair of
appropriately-tinted glasses.
|