XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] SAX - not well formed data

Am Dienstag, den 03.02.2009, 14:39 +0000 schrieb Michael Kay:
> > I have a document like this:
> > 
> > <xml>
> >   <page>
> >     <rev>...</rev>
> >     <rev>...</rev>
> >   </page>
> >   ... (some hundreds of pages)
> >   <page>
> >     <rev>...
> > 
> > so it's not well formed. 
> 
> It's not clear from that description why it isn't well-formed.

Well, I'm downloading and extracting a file with `curl http://... |
bzcat > test.xml`, but because it's very big, and I maybe haven't got
the time to analyse the whole data, I'm extracting pages from the
beginning, so I press CTRL+C sometime afterwards. Maybe I could extract
pages on-the-fly, with something like `curl http://... | bzcat | java
-jar ExtractArticles but I'm not really familiar with Pipes and so
on :( Probably I would need XMLStreamReader instead of the reader and
buffer input or something like that, but I tried it and failed...

> > I only want to be able to write out 
> > the first pages, but the SAX Parser throws errors:
> 
> You should be able to abort the parse when you have read what you want, by
> throwing an exception from any of the callback methods (e.g endElement()).
> The parser will then exit back to your application with an exception, which
> you can catch. You should check that this exception is the one you were
> expecting, not some other unrelated error in your input.

Ok, that's possibly the best thing.

Thank you!





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS