[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Structured from/within unstructured documents
- From: "Stephen Green" <stephengreenubl@gmail.com>
- To: "Dimitre Novatchev" <dnovatchev@yahoo.com>, "Jonathan Robie" <jonathan.robie@redhat.com>
- Date: Sat, 15 Dec 2007 23:04:54 +0000
Many thanks for these very prompt and useful pointers.
Looks like tools like Abbyy and LR(1) as a technology are
potential ways to go. I hope there are others too or that
others are developed soon to fill an obvious gap.
So parsing or OCR'ing the essentially visual representation
of unstructured data and documents make sense as the
first step toward structured documents, lossy though these
methods are likely, it seems, to be.
I guess I was hoping we were further ahead. So it seems
so far output to PDF, etc are one-way, dead-end streets.
Pity.
I note you can highlight text in a PDF in readers and copy
it to clipboard. Maybe tools based on such methods exist
for creating XML. Maybe one pass would create a template,
say, and then further documents of the same format (such
as in a form) could be handled automatically based on the
template - like OCR but adapted to natively handle electronic
paper. Any tools like that already which can output XML?
Best regards and thanks for these and any further pointers.
--
Stephen Green
Partner
SystML, http://www.systml.co.uk
Tel: +44 (0) 117 9541606
http://www.biblegateway.com/passage/?search=matthew+22:37 .. and voice
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]