XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: PDF to XML

We are currently working on a project that requires that conversion.

After a lot of researching we that the open source alternatives available can only extract plain text, and everyone outputs different result from the same document. The best results we got was from PDFMiner: http://www.unixuser.org/~euske/python/pdfminer/index.html that runs on Phyton

Currently we are writing a processor to automate the conversion from the obtained plain text to XML, and have plains to release it as open source as soon as it becomes usable.

Commercial alternatives can generate XML output, even from scanned documents via OCR, but they worth several thousand dollars. We tried KOFAX and it did the work, but the price wasn't affordable for our budget.

Hope this helps,

- Bill 



-----Mensaje original-----
De: Ihe Onwuka [mailto:ihe.onwuka@gmail.com] 
Enviado el: jueves, 01 de mayo de 2014 8:37 a. m.
Para: xml-dev@lists.xml.org
Asunto: PDF to XML

Is there a  tool/process for such conversions? I suppose I could always do it with Adobe Acrobat Professional (can I?).

Be grateful for commentary from someone who has been there before.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS