OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: RE: [xml-dev] Penance for misspent attributes

[ Lists Home | Date Index | Thread Index ]

5/17/02 3:28:33 AM, Sean McGrath <sean.mcgrath@propylon.com> wrote:

>There is more to it than a buffer. Parsers can and do emit chunks of content
>at boundaries that suit themselves. So
>
><foo>
>Hello world
></foo>
>
>is not guaranteed to produce 1 data event that can be slurped into a buffer in
>one go. More generally, in the presence of mixed content there will definitely
>be multiple chunks. So you end up with this pattern:
>
>start_foo:
>	buffer = ""
>	inFoo = 1
>
>end_foo:
>	print buffer
>
>characters (chunk):
>	if inFoo:
>		buffer.append (chunk)
>
>This rapidly gets out of hand.
>
>Rightly, the need for this pattern drives the data-heads nuts. It would be 
>soo nice to
>know that in the presence of data-oriented XML, the fundamental parser 
>layer would
>emit complete PCDATA chunks.
>
>Trouble is, there is no consensus on what data-oriented XML is and how
>it could be flagged to a processor. Consequently, data-oriented APIs that
>avoid that above unside-down and state-space-laden constructs
>such as RAX (http://www.xml.com/pub/a/2000/04/26/rax) cannot go
>anywhere.
>
>An XML Features Manifest would be one way to flag it
>(http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-Dec-1999/0002.html)
>but that never went anywhere either:-)

An even simpler alternative is a SAX filter that does nothing but condense consecutive PCDATA 
events.  Such a thing exists, for example, in the Perl world as XML::Filter::BufferText.  That way, 
you don't have to flag anything to the processor, you just read its output through a pair of 
appropriately-tinted glasses.







 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS