[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Error and Fatal Error
- From: Peter Flynn <peter@silmaril.ie>
- To: xml-dev@lists.xml.org
- Date: Sun, 17 Jul 2011 23:50:04 +0100
On 16/07/11 21:21, Stephen D Green wrote:
> > or it could allow "&" to represent itself if not followed by a name
> character
> �
> Maybe that alone would be enough to fix my bugbear.
> �
> But what did the spec writers expect to happen about the '<' character
> appearing in content?
I can only speak personally, and not as an author of the spec, but I
expected it to be a fatal error. It means the document has been created
by someone who does not know what they are doing, or by a process that
has insufficient data-checking.
It reminds me of the instructions to novices in a book on yachting,
about being caught on a lee shore under a following wind:
"1. Never allow yourself to be found in such circumstances." :-)
> More to the point: What did they expect an XML
> parser to do about such a character?
Notify the application that further parsing is pointless.
> Did they really expect preparsing
> to be necessary as an overhead merely for the purpose of replacing
> this character and the ampersand '&' in content?
Yes, absolutely. An editing interface which purports to deliver
XML-conformant documents or fragments to an XML process must not deliver
ill-formed data. Period. If it does, replace it with one that does not.
This is not rocket science (or if it is, I now know a number of rocket
scientists who will shortly become available). It is not beyond the wit
of editor-makers to trap < and & and make them insert < and &
into the data.
> Even aside from what the XML spec witers expected; what did XML
> parser writers expect developers to do about these characters in
> content being passed to the parser? Did they too expect pre-parsing
> just for the purpose of removing/replacing/escaping such characters?
I haven't asked them, but I would expect yes. The purpose of a parser is
to report errors, not to correct them.
> Either way seems slightly irrational to me. Which developers in their
> right mind would expect to be doing preparsing before sending XML
> to a parser?
Almost all of them, I would guess.
> They would surely just expect the parser to be able to
> handle these characters.
I doubt it very much indeed.
> They would surely expect any standards
> compliance of that parser (for conforming to the XML specs) to include
> being able to gracefully handle these characters.
That's a different matter. Graceful handling may be recommended but is
optional :-)
> If not, they would
> want to see this fix before they have to insist on fellow developers
> knowing what to do about it. They wouldn't expect to have to write a
> parser just to be able to send some XML to a 'standard'�parser.
It depends on what size and scale of fragments you deal with.
In a simple HTML input element, I would expect the receiving function at
mimunum to change markup flag characters to their escapes, and check
that the other characters are in range for the target application. This
applies regardless of target: I do it for XML and SQL and TeX daily.
In a multi-line text box, where the scope for error is much wider,
either use an embedded XML editor like Xopus, or perform a some kind of
analytic pass to see if the text contains some unidentified stream of
characters masquerading as markup, and act appropriately.
If you allow users to upload entire documents, then they get the full
parse-and-validate.
///Peter
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]