xml-dev - Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP

Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP

[ Lists Home | Date Index | Thread Index ]

To: Jeff Greif <jgreif@alumni.princeton.edu>
Subject: Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
From: "Simon St.Laurent" <simonstl@simonstl.com>
Date: 28 Jan 2002 20:23:00 -0500
Cc: xml-dev@lists.xml.org
In-reply-to: <00f101c1a847$454579d0$0402a8c0@DBDELL130G>
References: <ID9683SD923Z3Z3XNHPL4YEB5YURD9.3c547e49@MChamp><3C548D9D.D036830C@prescod.net><1012179014.1809.929.camel@localhost.localdomain><3C559A0B.240B32FC@prescod.net> <00f101c1a847$454579d0$0402a8c0@DBDELL130G>

On Mon, 2002-01-28 at 17:00, Jeff Greif wrote:
> It's also a question of volume.  A 1% error rate that needs human cleanup is
> not a big deal when you only see 100 docs per day, but it mounts up when
> there a million.

Sure, if you start with a million docs and need to train all those
errors the first day.  If you start with a hundred and add new
variations - remember, "errors" is not the right turn of phrase - over a
period of time, one manual mapping a day can lead to a lot of automated
processing after a few weeks.

> Analogy: A friend is slowly scanning and turning into PDF files all the
> reprints and preprints (in planetary science) that he's collected since the
> late 1960's.  He runs the scanner more or less continuously while at home,
> and takes the files produced on his laptop when he travels, and does a sort
> of desultory fixup of the OCR (since he has the page images as well) as
> lulling airplane activity.  Serious fixup occurs when he actually has to
> consult the paper for details.

OCR is a trickier case, since it pretty much requires a human to go over
everything to validate its content.  I remember early systems (some on
the Mac, if I remember the early 90's well enough) which did ask people
for help with difficult characters along the way, but think that got
automated out in favor of batch processing in bulk.

-- 
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com

References:
- Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
  - From: Mike Champion <mc@xegesis.org>
- Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
  - From: Paul Prescod <paul@prescod.net>
- Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
  - From: "Simon St.Laurent" <simonstl@simonstl.com>
- Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
  - From: Paul Prescod <paul@prescod.net>
- Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
  - From: "Jeff Greif" <jgreif@alumni.princeton.edu>

Prev by Date: RE: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
Next by Date: Re: [xml-dev] converting XML Schema to DTD
Previous by thread: Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
Next by thread: Re: [xml-dev] Web Design Principles (was Re: [xml-dev] Generality ofHTTP)
Index(es):
- Date
- Thread