OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] PDF file into XML

[ Lists Home | Date Index | Thread Index ]

SMir@Wockhardtin.com wrote:
 > How can i convert the PDF file into the XML.
 > Regards
 > Shahroz

I'm assuming you mean you want to convert to an XML representation in 
which you get your PDF document's semantic content marked-up with XML. 
Not an XML representation of the visual aspect -- an SVG version of the 
PDF, for example.

If you Google for "pdf to xml" you'll see some commercial tools for 
doing this, some of which I've seen produce reasonable results. These 
analyze the PDF and try to work out from the typography basic generic 
markup for the content (paragraphs, lists, etc.). Having said that, it's 
not really possible to 'turn the sausage back into the pig' - your best 
bet is to try and convert from whatever was used to create the PDF. And 
if this isn't possible, unless you have a large volume of content, 
re-keying will probably prove to be the most accurate and efficient 

- Alex.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS