OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] PDF 2 XML

[ Lists Home | Date Index | Thread Index ]

From: <Kevin.Gutch@mapinfo.com>
> Are there any PDF to XML utilities in existence? I assume it is difficult
> to re-purpose PDF to XML.

Acrobat 5 has a plugin called XMLExtract or somethign (free from Adobe I think)
which attempts to extract XML.  If the document was created by a 
structured editor, such as FrameMaker+SGML, then there is a fighting chance
that the XML won't be complete crap. Does anyone have experience with it?

But people who want to use PDF or XML as archiving formats should beware.
Old PDF locks up the data, and XML can have URL dependencies.  So for
archiving, treat the XML as sub-SGML, not as super-HTML: make sure you
archive all the resources for the document and change system IDs and links
to refer to the local version, as snapshots.  (I guess you could also use a 
catalog system to override the system IDs, but you don't know whether
the software in the future will cope readily with catalogs, so I don't know
that the extra level of indirection is worthwhile.)


  • References:


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS