RE: PDF to XML

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

RE: PDF to XML

From: William Velasquez <wvelasquez@visiontecnologica.com>
To: "ihe.onwuka@gmail.com" <ihe.onwuka@gmail.com>, "xml-dev@lists.xml.org"<xml-dev@lists.xml.org>
Date: Thu, 1 May 2014 15:00:19 +0000

We are currently working on a project that requires that conversion.

After a lot of researching we that the open source alternatives available can only extract plain text, and everyone outputs different result from the same document. The best results we got was from PDFMiner: http://www.unixuser.org/~euske/python/pdfminer/index.html that runs on Phyton

Currently we are writing a processor to automate the conversion from the obtained plain text to XML, and have plains to release it as open source as soon as it becomes usable.

Commercial alternatives can generate XML output, even from scanned documents via OCR, but they worth several thousand dollars. We tried KOFAX and it did the work, but the price wasn't affordable for our budget.

Hope this helps,

- Bill 



-----Mensaje original-----
De: Ihe Onwuka [mailto:ihe.onwuka@gmail.com] 
Enviado el: jueves, 01 de mayo de 2014 8:37 a. m.
Para: xml-dev@lists.xml.org
Asunto: PDF to XML

Is there a  tool/process for such conversions? I suppose I could always do it with Adobe Acrobat Professional (can I?).

Be grateful for commentary from someone who has been there before.

References:
- PDF to XML
  - From: Ihe Onwuka <ihe.onwuka@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]