XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] SAX - not well formed data

Incidentally, you could also achieve the same effect with a one-line query
using the Saxon-SA streaming capabilities.

java com.saxonica.Query -qs:"saxon:stream(doc('in.xml')/xml/page)[1]"

should do the job. It will automatically stop reading the input when it has
found the data it needs. 

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Johannes Lichtenberger 
> [mailto:Johannes.Lichtenberger@uni-konstanz.de] 
> Sent: 03 February 2009 15:49
> To: Michael Kay
> Cc: 'xml-dev'
> Subject: RE: [xml-dev] SAX - not well formed data
> 
> Am Dienstag, den 03.02.2009, 14:39 +0000 schrieb Michael Kay:
> > > I have a document like this:
> > > 
> > > <xml>
> > >   <page>
> > >     <rev>...</rev>
> > >     <rev>...</rev>
> > >   </page>
> > >   ... (some hundreds of pages)
> > >   <page>
> > >     <rev>...
> > > 
> > > so it's not well formed. 
> > 
> > It's not clear from that description why it isn't well-formed.
> 
> Well, I'm downloading and extracting a file with `curl 
> http://... | bzcat > test.xml`, but because it's very big, 
> and I maybe haven't got the time to analyse the whole data, 
> I'm extracting pages from the beginning, so I press CTRL+C 
> sometime afterwards. Maybe I could extract pages on-the-fly, 
> with something like `curl http://... | bzcat | java -jar 
> ExtractArticles but I'm not really familiar with Pipes and so 
> on :( Probably I would need XMLStreamReader instead of the 
> reader and buffer input or something like that, but I tried 
> it and failed...
> 
> > > I only want to be able to write out the first pages, but the SAX 
> > > Parser throws errors:
> > 
> > You should be able to abort the parse when you have read what you 
> > want, by throwing an exception from any of the callback 
> methods (e.g endElement()).
> > The parser will then exit back to your application with an 
> exception, 
> > which you can catch. You should check that this exception 
> is the one 
> > you were expecting, not some other unrelated error in your input.
> 
> Ok, that's possibly the best thing.
> 
> Thank you!
> 
> 
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS