[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Error and Fatal Error
- From: Stephen D Green <stephengreenubl@gmail.com>
- To: Andrew Welch <andrew.j.welch@gmail.com>
- Date: Mon, 18 Jul 2011 13:45:20 +0100
I shouldn't really rise to this one, but here goes anyway:
1) I wasn't thinking anything except XML would be parsed with an XML parser
2) some say if XML has illegal characters it is not XML but I say - then why does
the spec talk about errors in the XML (if the XML had errors, then by that reasoning
it wouldn't be XML, ...)
3) rational people have obviously had it in mind in writing the spec that XML can
have errors which the parser would pick up and help them to correct - and that
process has requirements for which I think the spec doesn't completely cater
4) I don't at all buy the argument that a few characters needing escaping mean
some XML isn't actually XML and therefore isn't covered by the spec or the
requirements for the XML parser
5) some text is best parsed by an XML parser because it is within the scope of
the spec if it is XML (even if it has errors) - because there is some text is
to all intents and purposes XML (even with the errors)
6) I've had enough of trying to prove what to so many is blindingly obvious
----
Stephen D Green
On 18 July 2011 11:09, Andrew Welch
<andrew.j.welch@gmail.com> wrote:
> gracefully. That
> seems to be, as would be expected, only properly, emphasised in the XML spec
> for behaviour of the conforming XML parser. At least that would be the
> intention of
> the spec. The actual outcome in conforming software would depend on how good
> the spec does its job (and how well the architects of XML and the XML spec
> design understand the effect of the spec on implementers, which isn't easy
> to do
> and requires feedback at all stages and possibly redesign as part of the
> spec's
> maintenance). So here the spec wants the parser to be useful for what some
> are
> calling preparsing - a step where errors are found and the application using
> the
> parser gets an opportunity to correct them. This aspect of the parser/spec
> is what
> I want to bring to peoples' attention so the spec can be improved rather
> than try
> to work out why errors happen in the first place. If the spec attempts to
> allow for the
> correction of errors it is doing better than just saying 'errors should
> never happen'.
The pre-processing step shouldn't involve an xml parser - only ever
parse xml with an xml parser. The pre-processing step can be
something like a tidy, or correcting mismatches between the encoding
and prolog, or handling html entities where there's no doctype... etc
From what you've been saying this sounds like a pretty standard task
that you've come at the from the wrong angle, and now see a problem
with the spec rather than a problem with your process.
A while back I had to write the RSS widget for vodafone that could
transform with xslt around 300 or so feeds from around the world (and
any user supplied feed), and it really did seem only about 10% were
well formed xml. Several were complete rubbish, but given enough
layers of pre-processing (before getting to the xml parser) they could
be turned into something that could be processed as xml.
--
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]