XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Error and Fatal Error

On 16/07/11 21:21, Stephen D Green wrote:
>  > or it could allow "&" to represent itself if not followed by a name
> character
> �
> Maybe that alone would be enough to fix my bugbear.
> �
> But what did the spec writers expect to happen about the '<' character
> appearing in content?

I can only speak personally, and not as an author of the spec, but I 
expected it to be a fatal error. It means the document has been created 
by someone who does not know what they are doing, or by a process that 
has insufficient data-checking.

It reminds me of the instructions to novices in a book on yachting, 
about being caught on a lee shore under a following wind:
"1. Never allow yourself to be found in such circumstances." :-)

> More to the point: What did they expect an XML
> parser to do about such a character?

Notify the application that further parsing is pointless.

> Did they really expect preparsing
> to be necessary as an overhead merely for the purpose of replacing
> this character and the ampersand '&' in content?

Yes, absolutely. An editing interface which purports to deliver 
XML-conformant documents or fragments to an XML process must not deliver 
ill-formed data. Period. If it does, replace it with one that does not.

This is not rocket science (or if it is, I now know a number of rocket 
scientists who will shortly become available). It is not beyond the wit 
of editor-makers to trap < and & and make them insert &lt; and &amp; 
into the data.

> Even aside from what the XML spec witers expected; what did XML
> parser writers expect developers to do about these characters in
> content being passed to the parser? Did they too expect pre-parsing
> just for the purpose of removing/replacing/escaping such characters?

I haven't asked them, but I would expect yes. The purpose of a parser is 
to report errors, not to correct them.

> Either way seems slightly irrational to me. Which developers in their
> right mind would expect to be doing preparsing before sending XML
> to a parser?

Almost all of them, I would guess.

> They would surely just expect the parser to be able to
> handle these characters.

I doubt it very much indeed.

> They would surely expect any standards
> compliance of that parser (for conforming to the XML specs) to include
> being able to gracefully handle these characters.

That's a different matter. Graceful handling may be recommended but is 
optional :-)

> If not, they would
> want to see this fix before they have to insist on fellow developers
> knowing what to do about it. They wouldn't expect to have to write a
> parser just to be able to send some XML to a 'standard'�parser.

It depends on what size and scale of fragments you deal with.

In a simple HTML input element, I would expect the receiving function at 
mimunum to change markup flag characters to their escapes, and check 
that the other characters are in range for the target application. This 
applies regardless of target: I do it for XML and SQL and TeX daily.

In a multi-line text box, where the scope for error is much wider, 
either use an embedded XML editor like Xopus, or perform a some kind of 
analytic pass to see if the text contains some unidentified stream of 
characters masquerading as markup, and act appropriately.

If you allow users to upload entire documents, then they get the full 
parse-and-validate.

///Peter


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS