OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] To continue parsing after a fatal error.

Hi Dave,
    I guess the diagnostic is correct. And I think I should either
preprocess the XML as you have said, or split the file into smaller files as
Tim Bray has suggested. I hope the 'Out of memory' situation won't arise. It
was precisely for this reason that I decided to use SAX rather than DOM. In
fact, using DOM for an 800 MB file is unthinkable.

Thanks for your suggestion.

----- Original Message -----
From: David Brownell <david-b@pacbell.net>
To: Anoop A V <anoop_scorpio@hotmail.com>
Sent: Wednesday, October 24, 2001 12:26 AM
Subject: Re: [xml-dev] To continue parsing after a fatal error.

> 800 MB ... are you sure the diagnostic is correct?  That's a
> pretty big file, and C/COM/... level code could easily give
> a bogus diagnostic.  I'd expect "ran out of memory".
> If the file has a correct XML declaration, with the right text
> encoding, then you certainly need to tell whoever produces
> the file that they've got bugs ... as a rule, MSXML has bugs
> that others don't, so if even MSXML rejects that file, then
> whoever is making that file probably has some big problems.
> Assuming they're not doing their job, however, you should
> still be able to solve the problem by preprocessing the XML
> to strip out illegal characters.  Think of it as another processing,
> the first of N scans over the data just finds and removes those
> characters.
> It's actually a requirement of the XML spec that once a
> fatal error is found, no more data will ever be reported
> (only additional errors, and that's not required).
> - Dave
> ----- Original Message -----
> From: "Anoop A V" <anoop_scorpio@hotmail.com>
> To: <xml-dev@lists.xml.org>
> Sent: Tuesday, October 23, 2001 10:51 AM
> Subject: [xml-dev] To continue parsing after a fatal error.
> > Hi,
> > I have an 800 MB file which I need to parse. When I do this using MSXML
> > parser, I get a fatal error with the message "Invalid character found in
> > text content". And the parsing will be stopped. But I need to continue
> > parsing the file even if an invalid character is met. I don't mind if
> > particular node(s) is skipped. But I need to parse the whole file. This
> > is not under my control, so there is no question of my being able to
> > this file and remove the invalid characters. Can anybody help?
> >
> > Thanks.
> > Anoop.
> >
> > _________________________________________________________________
> > Get your FREE download of MSN Explorer at
> >
> >
> > -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> >
> > The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> > To subscribe or unsubscribe from this elist use the subscription
> > manager: <http://lists.xml.org/ob/adm.pl>