OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: &anytype; to RTF converter. Need help!

[ Lists Home | Date Index | Thread Index ]
  • From: "Rick Jelliffe" <ricko@allette.com.au>
  • To: <xml-dev@ic.ac.uk>
  • Date: Thu, 20 May 1999 23:37:34 +1000


From: Ing. Cesar Bonavides M. <cbona18@campus.mor.itesm.mx>
 >I'd like to know if there is a converter from:
>
>TeX, or LaTex, or PostScript or PDF     to   RTF.
>
>I know there's something out there, but I don't have time to find it
out.

To go from PDF to RTF through you can use

* Magellen to go from PDF to HTML
* Dave Ragget's tidy to go from HTML to XHTML
* then write a little OmniMark or Perl or Python script to go from XHTML
to RTF

All these tools are free. Do not ask me for references.

Unfortunately, you may have to locate and learn 3 tools and 4 languages
along the way.

Magellen does not give you text in lines: just words with absolute XY
coordinates. So
you can index the words, but you cannot really edit them. If the pages
are really simple that you can try to figure out lines in some text
processing language.  You may need to try different revisions of the
application that generated the code in order to find the one that puts
out the best PDF.

More realistically, you could try to find a Word Processing package that
accepts HTML and understands absolute positioning attributes: I doubt if
Word does but perhaps FrameMaker might.

Another possibility is to divide your postscript into single pages, and
then use Adobe Illustrator (or is it FreeHand that can read in HTML): it
is remotely possible that Illustrator (or is it FreeHand) can save as
RTF.

Yet another possibility that should not be ignored is to keep the
postscript files, but use
Magellen to extract a good word index for that page (Xeros InXight have
a product for this too) , and perhaps use a scanner to get the text into
lines if people need text. Then you make some metadata for each document
(Dublin Core). This gives you:
* formatted pages
* indexed data
* unformatted text for people who want to extract parts of the data into
other documents
* metadata for finding aids.

That is not nearly as good as an XML document, but might give people a
lot of what they need.  If you need to put it into RTF, import as text,
get  a human to mark it up, and export as RTF.


Rick Jelliffe


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS