OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Structured from/within unstructured documents

Many thanks for these very prompt and useful pointers.

Looks like tools like Abbyy and LR(1) as a technology are
potential ways to go. I hope there are others too or that
others are developed soon to fill an obvious gap.

So parsing or OCR'ing the essentially visual representation
of unstructured data and documents make sense as the
first step toward structured documents, lossy though these
methods are likely, it seems, to be.

I guess I was hoping we were further ahead. So it seems
so far output to PDF, etc are one-way, dead-end streets.

I note you can highlight text in a PDF in readers and copy
it to clipboard. Maybe tools based on such methods exist
for creating XML. Maybe one pass would create a template,
say, and then further documents of the same format (such
as in a form) could be handled automatically based on the
template - like OCR but adapted to natively handle electronic
paper. Any tools like that already which can output XML?

Best regards and thanks for these and any further pointers.

Stephen Green

SystML, http://www.systml.co.uk
Tel: +44 (0) 117 9541606

http://www.biblegateway.com/passage/?search=matthew+22:37 .. and voice

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS