XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Error and Fatal Error

Maybe I have got some attention but will it solve anything I wonder.
 
I did post to my blog about this a few months back along with code
http://stephengreenxml.blogspot.com/2011/03/xml-special-character-gotchas.html 
 
The way the parser works (and its ancestors) is to throw a special
exception (XMLException I think) when it encounters something
illegal characters. This is, I assume, as per the spec telling it to stop
processing and report a fatal error. The developer catches this with
a try/catch and passes control to a bit of code in order to attempt to
fix the illegal character (as per the spec). This involves going through
each error in turn (since, as per the spec, several errors may have
occurred only one of which was fatal) and determine whether it is an
element name error (or attribute, etc) or whether the error is in the
content of an element (or attribute). In my scenario the code in the
app itself created the error so there is not much chance of there
being an error in an element name but the content came from a
webpage form (via AJAX in this case) so there is likely to be an error
in the content of an illegal character in the user input; therefore the
code will concentrate on finding and replacing such characters.
Call this preparsing if you like - using the same parser as used for
normal parsing but using its features which it implements according
to the spec to allow for error correction (which is what we mean by
pre-parsing). The Exception has enough information to allow the
code, together with the XML as a string, to find the eroneous illegal
character. But my bugbear is that in implementing the spec the
parser producers have thrown a fatal Exception meaning the state
of the parser is not condusive to such error correction. I maintain that
the spec is misleading in requiring a fatal error and at the same time
suggesting allowing error correction since both are mutually exclusive
to some extent, as you'll see from how it works in practice and from
parser instructional material which reports that after an XMLException
the state of the parser is unpredictable. To my mind this is as a result
of it having to stop processing (according to the spec). Here's the
code used to implement all this (a little simplistically because it is a
RAD application implemented with fast turnaround requirements).
 
public static string EscapeXmlSpecialCharacters(string XmlString)
{
string resultString = "";
//Create and load the XML document.
XmlDocument doc = new XmlDocument();
try
{
doc.LoadXml(XmlString);
resultString = XmlString;
}
catch (XmlException ex)
{
StringReader str = new StringReader(XmlString);
StringWriter stw = new StringWriter(new StringBuilder(resultString));
string output = "";
long i = 0;
string strline = "";
long linenumber = (int)ex.LineNumber;
long lineposition = (int)ex.LinePosition;
while (i < linenumber - 1)
{
strline = str.ReadLine();
stw.WriteLine(strline);
i = i + 1;
}
strline = str.ReadLine();
string strOffendingCharacter = strline.ToString().Substring((int)lineposition - 2, 1);
string strOffendingCharacterAndFollowing5 = strline.ToString().Substring((int)lineposition - 2, 5);
switch (strOffendingCharacter)
{
case "<":
strline = strline.Substring(0, (int)lineposition - 2) + "&lt;" + strline.Substring((int)lineposition - 1);
break;
case "&":
// ensure we are not replacing the ampersand in an already escaped special character (&lt;, &gt;, &apos;, &quot; or &amp;)
switch (strOffendingCharacterAndFollowing5.Substring(1, 3))
{
case "lt;":
break;
case "gt;":
break;
default:
switch (strOffendingCharacterAndFollowing5.Substring(1, 4))
{
case "amp;":
break;
default:
switch (strOffendingCharacterAndFollowing5)
{
case "apos;":
break;
case "quot;":
break;
default:
strline = strline.Substring(0, (int)lineposition - 2) + "&amp;" + strline.Substring((int)lineposition - 1);
break;
}
break;
}
break;
}
break;
}
stw.WriteLine(strline);
strline = str.ReadToEnd();
stw.WriteLine(strline);
output = stw.ToString();
str.Close();
str = null;
stw.Flush();
stw.Close();
stw = null;
resultString = EscapeXmlSpecialCharacters(output);
}
return resultString;
}


 
----
Stephen D Green



On 18 July 2011 01:34, David Lee <dlee@calldei.com> wrote:

This is starting to sound like a toolkit bug.  And as such probably on the wrong list.

But obviously you have a lot of people's active attention !

 

If you could post a code snippet may answer a thousand questions.

The one I have is

 

"Does the toolkit generate the bad XML or does the custom code?"

 

If the toolkit/framework is generating the XML and it passes through unescaped invalid XML markup then the toolkit has a bug and should be reported *to the toolkit developers*.

If custom code is generating bad XML then it needs to be fixed by the custom code developers.

 

In neither case is the "XML Spec" at fault here. 

Any more than passing an extra "," to CSV or a UTF8 sequence to a Ascii parser or any of a thousand million zillion examples I could come up with offhand of invalid data to languages/parsers/tools which expect valid data.
It's pretty clear in XML specs what's valid and what is not.   GIGO and all that ...

 

 

 

 

 

 

 

----------------------------------------

David A. Lee

dlee@calldei.com

http://www.xmlsh.org

 

From: Stephen D Green [mailto:stephengreenubl@gmail.com]
Sent: Sunday, July 17, 2011 6:12 PM
To: Andrew Welch
Cc: David Carlisle; xml-dev@lists.xml.org
Subject: Re: [xml-dev] Error and Fatal Error

 

I'll try and have a look at the code again tomorrow at work.

 

As far as I remember we do not have access to the strings.

These are *very* commonly used Ajax controls and they

probably bind to a dataset from ASP.NET 'markup'. Not

everything in controls like this is available to the developer.

If the controls do bind to a dataset (XML a la .NET) then

the data is possibly pre-packaged as XML even if it has '&'

in the element content. Besides, this is a framework we are

talking about so I do not have much say in what other

developers working now and in the future on the apps do.

One tends to stick with the framework (go with the flow) and

understand that others will try and do the same.

----

Stephen D Green



On 17 July 2011 22:26, David Carlisle <davidc@nag.co.uk> wrote:

On 17/07/2011 22:13, Stephen D Green wrote:

I don't buy that. And not so easy to replace '<' with
'&lt;' in just the element content and not the tags.
----

 

I thought you indicated that you were taking strings from user supplied form data and adding it to xml, in which case you need to escape the xml syntax characters before you add it to the xml so there is no element content and no tags to worry about. You don't want to add it then try to parse to find element content afterwards as if adding the content has made the xml non well formed you've already lost.

David

On 17 July 2011 22:26, Andrew Welch <andrew.j.welch@gmail.com> wrote:

On 17 July 2011 22:13, Stephen D Green <stephengreenubl@gmail.com> wrote:
> I don't buy that. And not so easy to replace '<' with
> '&lt;' in just the element content and not the tags.

It really is straightforward... if you are adding to an in memory tree
just add the text as-is, if you are adding to serialised xml then just
put text through a serialiser first (by wrapping it in a root node,
serialising it, then substringing it out).

If that's not what you mean then can you do an example of the problem?

If the user is writing markup then it's back to helping them get it
right first time by parsing in the background and highlighting errors.

--

 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS