[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] SAX - not well formed data
- From: "Michael Kay" <mike@saxonica.com>
- To: "'Johannes Lichtenberger'" <Johannes.Lichtenberger@uni-konstanz.de>
- Date: Tue, 3 Feb 2009 15:57:11 -0000
Incidentally, you could also achieve the same effect with a one-line query
using the Saxon-SA streaming capabilities.
java com.saxonica.Query -qs:"saxon:stream(doc('in.xml')/xml/page)[1]"
should do the job. It will automatically stop reading the input when it has
found the data it needs.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Johannes Lichtenberger
> [mailto:Johannes.Lichtenberger@uni-konstanz.de]
> Sent: 03 February 2009 15:49
> To: Michael Kay
> Cc: 'xml-dev'
> Subject: RE: [xml-dev] SAX - not well formed data
>
> Am Dienstag, den 03.02.2009, 14:39 +0000 schrieb Michael Kay:
> > > I have a document like this:
> > >
> > > <xml>
> > > <page>
> > > <rev>...</rev>
> > > <rev>...</rev>
> > > </page>
> > > ... (some hundreds of pages)
> > > <page>
> > > <rev>...
> > >
> > > so it's not well formed.
> >
> > It's not clear from that description why it isn't well-formed.
>
> Well, I'm downloading and extracting a file with `curl
> http://... | bzcat > test.xml`, but because it's very big,
> and I maybe haven't got the time to analyse the whole data,
> I'm extracting pages from the beginning, so I press CTRL+C
> sometime afterwards. Maybe I could extract pages on-the-fly,
> with something like `curl http://... | bzcat | java -jar
> ExtractArticles but I'm not really familiar with Pipes and so
> on :( Probably I would need XMLStreamReader instead of the
> reader and buffer input or something like that, but I tried
> it and failed...
>
> > > I only want to be able to write out the first pages, but the SAX
> > > Parser throws errors:
> >
> > You should be able to abort the parse when you have read what you
> > want, by throwing an exception from any of the callback
> methods (e.g endElement()).
> > The parser will then exit back to your application with an
> exception,
> > which you can catch. You should check that this exception
> is the one
> > you were expecting, not some other unrelated error in your input.
>
> Ok, that's possibly the best thing.
>
> Thank you!
>
>
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]