Lists Home |
Date Index |
- To: email@example.com
- Subject: RE: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
- From: "Bullard, Claude L (Len)" <firstname.lastname@example.org>
- Date: Mon, 28 Jan 2002 16:32:49 -0600
Document Imaging System sites now make reference to fuzzy
logic technologies that can clean up bad OCR output to some
acceptable rate, so affordable technology could help out
here. One remaining problem is trusting the fuzzy
logic to have not altered the original. This is similar
to the issue of legal document image fidelity: many
systems don't accept a document as legal if it can have
been altered by any means, aka, the identity problem.
The application of a technology can't usually be divorced
from the content it operates over. As long as the
identity requirement doesn't enter in, the process
can be lightly defined and the fast fingered typist
is as good as the OCR all other things being equal
(which they aren't but that's a longer story).
So we come back around to simple is ok until you have
strict requirements and money on the table. On
the other hand, any project I've ever worked on that
relied on volunteer effort had to be simple or the
predictor for success was very low.
I wonder if the Web Design Principles are different
with large well-funded organizations in the loop.
Consider the NASA effect: when an engineering
organization transforms into an engineering
project management organization, does the quality
of the product change or only the rate of the
From: Jeff Greif [mailto:email@example.com]
It's also a question of volume. A 1% error rate that needs human cleanup is
not a big deal when you only see 100 docs per day, but it mounts up when
there a million.
Analogy: A friend is slowly scanning and turning into PDF files all the
reprints and preprints (in planetary science) that he's collected since the
late 1960's. He runs the scanner more or less continuously while at home,
and takes the files produced on his laptop when he travels, and does a sort
of desultory fixup of the OCR (since he has the page images as well) as
lulling airplane activity. Serious fixup occurs when he actually has to
consult the paper for details.
From: "Paul Prescod" <firstname.lastname@example.org>
> Having computers and humans working together is great. But you seem to
> propose that users should be required to handle the exceptional cases
> that computers handle poorly. I'd suggest instead that the users would
> rather work with programmers (or visual mapping tools) to automate away
> those exceptional cases so that they can be freed up to do creative