Lists Home |
Date Index |
> How can i convert the PDF file into the XML.
I'm assuming you mean you want to convert to an XML representation in
which you get your PDF document's semantic content marked-up with XML.
Not an XML representation of the visual aspect -- an SVG version of the
PDF, for example.
If you Google for "pdf to xml" you'll see some commercial tools for
doing this, some of which I've seen produce reasonable results. These
analyze the PDF and try to work out from the typography basic generic
markup for the content (paragraphs, lists, etc.). Having said that, it's
not really possible to 'turn the sausage back into the pig' - your best
bet is to try and convert from whatever was used to create the PDF. And
if this isn't possible, unless you have a large volume of content,
re-keying will probably prove to be the most accurate and efficient