[
Lists Home |
Date Index |
Thread Index
]
That's good. Except the bit about costs going up. Why
would they?
A schema can be a guardian or a classification verifier.
One might assume, rightly or wrongly that the MIME type or
the extension or a magic number or a DOCTYPE tells
one the class. One might have to verify that.
Also, this application of a schema is only one of several possible.
I agree that forcing widespread use of a schema is a tough political
problem but it is a trivial technical issue. One says, "this is
the Internet, after all" but one means "these are humans
after all". Humans often fail Turing tests.
len
The trick to passing a Turing test is selecting the topic
of conversation wisely. Eagerness is everything.
From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu]
>On Jun 8, 2004, at 12:31 AM, Rick Marshall wrote:
>
>> and if the schema changes, but not the xslt, and someone suffers
>>financial loss - tax returns fail, orders lost, etc - who pays?
Perhaps there's a technical step in the proposed system you're
missing here. When receiving a document you first have to classify
it. That is, you must figure out if this is a kind of document you've
seen before, and if you have tools in place to process it
automatically. If you do, then dispatch it to one of those tools. If
not, dispatch it to a human for further analysis.
We can adjust how tight we make the recognition software. Personally,
I like loose, XPath based solutions like Schematron that ask whether
the document contains the information I want rather than asking
whether it tightly fits some W3C XML Schema Language schema. However,
if you want to use a conservative schema (everything not permitted is
forbidden) as your diagnosis, go ahead. You won't be able to process
quite as much automatically, and costs will go up; but maybe in your
environment and for your processes safety concerns do mandate that.
We can also have a middle ground, where XPath extracts the relevant
fragments of a document, and then each of these fragments we use is
validated closely without worrying about the outer envelope. And
there are lots of other points along the continuum as well.
However, the really key idea is to use the schema, in whatever
language, as a classification tool, not a guardian. The schema's job
is to sort documents into the right queue, not to accept some
documents unconditionally and reject all others.
|