OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Trust and control (as Re: [xml-dev] Here's how to processXML documents written in German)

On 1/31/13 12:47 PM, Liam R E Quin wrote:
> On Thu, 2013-01-31 at 07:16 -0500, Simon St.Laurent wrote:
>> Somewhere along the line programmers learned that only completely
>> perfect messages should be accepted.
> The difficulty has always been two-fold.
> First, that you have to allow for every variation in the software, as
> you don't want software to crash or allow execution of arbitrary code
> accidentally (vulnerabilities).

There are a lot of ways to handle this beyond "it must validate or I 
reject it".  A few of the easier ones include:

* Multiple processing pathways - many XSLT stylesheets can run, for 
example, on parts of a document and don't require a complete one. (Our 
tools are actually already more flexible than we often give them credit 

* Making crashes acceptable - Erlang's 'let it crash' philosophy says 
"fine, whatever" to crashes of sub-processes and allows process 
supervisors to deal or not deal with them as appropriate.  That could 
(if appropriate) still let part of a document get processed while other 
parts fail.  (This is funny to me in a language with extensive built-in 
ASN.1 support, but real.)

* Customizing schemas to fit particular processes.  I've had too many 
weird conference bar conversations about people who couldn't get the 
data they needed because documents failed validation for reasons that 
had nothing to do with the needs of a particular data-hungry process.

More exciting options - let's call them "the Walter Perry level of 
document inspection" - might include notifying other processes and even 
humans to take a look when something comes in not quite right.  Why? 
Because "not quite right" can be an important message on its own, the 
beginning of a conversation rather than the end.

> It's interesting to note that the
> widespread adoption of Intel's 808x little-endian architecture greatly
> increased vulnerability to stack attacks.

That would fit with the general theory of historical brittleness I've 
suggested.  The mistakes of the past seem to linger permanently.

> Second, that error correction is difficult.
> Error correction that varies from program to program means
> interoperability is limited to the subset of data that gets treated the
> same way everywhere.

This just spreads the brittleness around, but okay...

> This is what, for example, HTML 5 is about (partly)
> - documenting that subset for Web browsers, and trying to broaden it by
> having the browsers all use the same parsing and error correction
> techniques for new content.

Really?  I thought it was about browser vendors coming up with a wide 
variety of random ideas and dropping them in the world to figure out 
what works and what doesn't.  The standards process there seems more an 
afterthought than the driving force.

If you just mean the HTML5 syntax weirdness, there is a slight bit of 
broadening there by making the same oddities work or not work across 
browsers, but I'm always reminded about the "please provide syntax 
interop" requests from browser vendors that led to XML's well-formedness 
in the first place.

At its best, however, HTML5 not a process I'd use as an example of the 
value of schemas and strong syntax checking.

>> I could see the value of well-formedness, though I question even that
>> lately.  I don't understand, though, why we regularly insist that the
>> only information worth processing is that which arrived in pristine
>> condition.
> That's a stronger statement than I'd make.
  > If I make a mistake in a program, the chances are that
> (1) the compiler or interpreter catches it
> (2) I catch it in my unit tests
> (3) It's caught in application tests and Q/A
> (4) A customer complains
> (5) Everything seems fine until the 'plane tries to land when
>      the wind-speed is less than the ambient air temperature and
>      the 'plane is full of fuel.
> The cost of fixing problems increases at each stage.
> I've used a C compiler that could correct a large class of input errors;
> it detected when it had done so and did not generate code, but gave more
> helpful error messages.

And hence the dream that schemas would step in and provide the same 
level of error-checking that C offered in 1972, with a few more bells 
and whistles?

The costs of that straitjacket never seem to get reasonably compared to 
the costs of letting occasional errors flow to level 3 in your list above.

And yes, I get that those building airplanes and medical devices might 
want to inspect their data more closely than others - but those fields 
already have strong opinions about how to handle anomalies that are not 
necessarily the same as "apply a schema".

>> Programmers of the world, throw away your schemas!  You have nothing to
>> lose but your existing toolset! (aka your chains...)
> But I _like_ chains.

Not when they're heavy and you're trying to swim in deep water.

Simon St.Laurent

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS