OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [xml-dev] To continue parsing after a fatal error.



Actually, the invalid character error is per the XML 1.0 spec fatal.  If
Microsoft were to deviate from the W3C spec and make this ignorable, we
would deserve the roasting we would get at the merciless hands of
xml-dev list members.  The reasoning behind the rule is that it prevents
the spread of malformed XML, and thereby preserves interoperability.
This is a good and necessary reason, although it is not much consolation
when you are stuck trying to clean up bad XML.  We should have some
tools to make this easier..


> -----Original Message-----
> From: Jeff Greif [mailto:jgreif@alumni.princeton.edu]
> Sent: Tuesday, October 23, 2001 12:35 PM
> To: Anoop A V; xml-dev@lists.xml.org
> Subject: Re: [xml-dev] To continue parsing after a fatal error.
> 
> Normally you might attempt to deal with this kind of problem using a
> custom
> SAX error handler.  In MSXML3, however, you may not be able to this,
> because
> the underlying parsing code makes all errors fatal (calls the error
> handler's fatalError ()method always, rather than ever calling its
error()
> or ignorableWarning() methods). It appears that treating all errors as
> fatal
> limits recovery options.
> Details (not very many) are here:
> http://msdn.microsoft.com/library/en-
> us/xmlsdk30/htm/isaxerrorhandler_interf
> ace.asp?frame=true
> 
> I only looked this up out of curiosity.  I have not tried it myself
and am
> not pretending to be authoritative.
> 
> Jeff
> ----- Original Message -----
> From: "Anoop A V" <anoop_scorpio@hotmail.com>
> To: <xml-dev@lists.xml.org>
> Sent: Tuesday, October 23, 2001 10:51 AM
> Subject: [xml-dev] To continue parsing after a fatal error.
> 
> 
> > Hi,
> > I have an 800 MB file which I need to parse. When I do this using
MSXML
> SAX
> > parser, I get a fatal error with the message "Invalid character
found in
> > text content". And the parsing will be stopped. But I need to
continue
> > parsing the file even if an invalid character is met. I don't mind
if
> that
> > particular node(s) is skipped. But I need to parse the whole file.
This
> file
> > is not under my control, so there is no question of my being able to
> edit
> > this file and remove the invalid characters. Can anybody help?
> >
> > Thanks.
> > Anoop.
> >
> > _________________________________________________________________
> > Get your FREE download of MSN Explorer at
> http://explorer.msn.com/intl.asp
> >
> >
> > -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> >
> > The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> > To subscribe or unsubscribe from this elist use the subscription
> > manager: <http://lists.xml.org/ob/adm.pl>
> >
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>