OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [xml-dev] To continue parsing after a fatal error.



Actually, I'd say it is more likely that the XML is incorrectly identified
as UTF-8 (or lacks an encoding declaration), and is not truly UTF-8. This is
an extremely common error. Character encoding issues are poorly understood
by most developers.

Try a simple experiment: make sure the document has the following
declaration at the top:
<?xml version="1.0" encoding="ISO-8859-1"?>

See if that fixes the problem. It probably will (but if it doesn't, I'm
probably wrong and Joshua probably right regarding the problem).

Then tell the person who sent you the XML to read the following:
http://msdn.microsoft.com/library/default.asp?URL=/library/en-us/dnxml/html/
xmlencodings.asp

Although watch out for the typos that show incorrect syntax for HTTP
headers. They show this as an example:
Content-Type: text/html; charset:ISO-8859-1;

The correct syntax is:
Content-Type: text/html; charset=ISO-8859-1

(Maybe Joshua or Julia can use their influence at Microsoft to get these
typos in an otherwise very useful article corrected?)

-----Original Message-----
From: Joshua Allen [mailto:joshuaa@microsoft.com]
Sent: Tuesday, October 23, 2001 12:40 PM
To: Anoop A V; xml-dev@lists.xml.org
Cc: Julia Jia
Subject: RE: [xml-dev] To continue parsing after a fatal error.


This error should occur with any conforming XML processor.  It is quite
likely that the error is caused by a control character in the low ASCII
range.  The only way to avoid the problem is to clean up the XML on the
way in, before it is processed by MSXML.  And unfortunately I am not
aware of a way to do this without writing code to pipe the input stream
through a scrubber before passing it to MSXML.  Julia will know if there
are any code samples existing today (I doubt it).

Thanks,
Joshua




> -----Original Message-----
> From: Anoop A V [mailto:anoop_scorpio@hotmail.com]
> Sent: Tuesday, October 23, 2001 10:51 AM
> To: xml-dev@lists.xml.org
> Subject: [xml-dev] To continue parsing after a fatal error.
> 
> Hi,
> I have an 800 MB file which I need to parse. When I do this using
MSXML
> SAX
> parser, I get a fatal error with the message "Invalid character found
in
> text content". And the parsing will be stopped. But I need to continue
> parsing the file even if an invalid character is met. I don't mind if
that
> particular node(s) is skipped. But I need to parse the whole file.
This
> file
> is not under my control, so there is no question of my being able to
edit
> this file and remove the invalid characters. Can anybody help?
> 
> Thanks.
> Anoop.