In my experience this is a novel way of attempting to use XML tools, and even if you could get it to 'work' doesn't actually solve the worse of the problems. I would not attempt, nor expect to work, using a parser in this way. That is, using a parser to throw anything I can at it and hope that it would 'catch errors and fix them. Even if it *could* 'catch errors' and let me fix them, what about totally valid markup which is not supposed to be markup ? Those are the worse kind of errors. This is a classic case of 'A fence on the hill or an ambulance down in the valley'. This leaves you wide open to XML injection attacks. That's extremely scary and should be addressed with the upmost attention. Invalid markup is a minor annoyance degenerate case of the primary problem of allowing *any markup* being entered by the user. To use a Banking Analogy, it's like having an ATM use field validation that only checks the textual format of the number, not whether you have that much money in your account. The code snipping you (Steve) provided is not the one I'm interested in. That's the Ambulance in the valley. what I'm interested in is how is the XML text *initially* created. That is where the bug is. I think you will find little disagreement that its extremely difficult to try to fix up errors after then are injected. In fact I assert it is not possible to reasonably do so at all, with any parser or tool or specification, in this scenario, because at the context the errors are detected is too far down the processing chain - valid or invalid content because an application contextual semantic problem, not a syntactic issue. Even schema validation may not catch XML injection, depending on the schema. And even if it could, should be prevented by proper escaping of the xml prior to insertion and not expecting the parser to handle the entire responsibility of detecting or fixing invalid content. At least that is my opinion. ---------------------------------------- David A. Lee From: Stephen D Green [mailto:stephengreenubl@gmail.com] David There are always bugs. I don't see that as the issue; rather that we always have to write even simple apps such that the bugs do not cause problems for the enduser. In this case it means we have to anticipate errors and handle them gracefully. That seems to be, as would be expected, only properly, emphasised in the XML spec for behaviour of the conforming XML parser. At least that would be the intention of the spec. The actual outcome in conforming software would depend on how good the spec does its job (and how well the architects of XML and the XML spec design understand the effect of the spec on implementers, which isn't easy to do and requires feedback at all stages and possibly redesign as part of the spec's maintenance). So here the spec wants the parser to be useful for what some are calling preparsing - a step where errors are found and the application using the parser gets an opportunity to correct them. This aspect of the parser/spec is what I want to bring to peoples' attention so the spec can be improved rather than try to work out why errors happen in the first place. If the spec attempts to allow for the correction of errors it is doing better than just saying 'errors should never happen'. ---- Stephen D Green On 18 July 2011 01:34, David Lee <dlee@calldei.com> wrote: This is starting to sound like a toolkit bug. And as such probably on the wrong list. But obviously you have a lot of people's active attention ! If you could post a code snippet may answer a thousand questions. The one I have is "Does the toolkit generate the bad XML or does the custom code?" If the toolkit/framework is generating the XML and it passes through unescaped invalid XML markup then the toolkit has a bug and should be reported *to the toolkit developers*. If custom code is generating bad XML then it needs to be fixed by the custom code developers. In neither case is the "XML Spec" at fault here. Any more than passing an extra "," to CSV or a UTF8 sequence to a Ascii parser or any of a thousand million zillion examples I could come up with offhand of invalid data to languages/parsers/tools which expect valid data. ---------------------------------------- David A. Lee From: Stephen D Green [mailto:stephengreenubl@gmail.com] I'll try and have a look at the code again tomorrow at work. As far as I remember we do not have access to the strings. These are *very* commonly used Ajax controls and they probably bind to a dataset from ASP.NET 'markup'. Not everything in controls like this is available to the developer. If the controls do bind to a dataset (XML a la .NET) then the data is possibly pre-packaged as XML even if it has '&' in the element content. Besides, this is a framework we are talking about so I do not have much say in what other developers working now and in the future on the apps do. One tends to stick with the framework (go with the flow) and understand that others will try and do the same. ---- Stephen D Green
On 17/07/2011 22:13, Stephen D Green wrote: I don't buy that. And not so easy to replace '<' with I thought you indicated that you were taking strings from user supplied form data and adding it to xml, in which case you need to escape the xml syntax characters before you add it to the xml so there is no element content and no tags to worry about. You don't want to add it then try to parse to find element content afterwards as if adding the content has made the xml non well formed you've already lost. On 17 July 2011 22:26, Andrew Welch <andrew.j.welch@gmail.com> wrote: On 17 July 2011 22:13, Stephen D Green <stephengreenubl@gmail.com> wrote: It really is straightforward... if you are adding to an in memory tree Andrew Welch |