OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] WORD TO XML/SGML

[ Lists Home | Date Index | Thread Index ]


There are several ways you can approach this project but none of them
are going to be painless.  The problem with Word is that everytime a
user clicks in a Word file something is put into the file.  Content
tagging Word file is tricky.  Fortunately, you are probably using
one of the CALS DTDs which you can get by without doing a lot
of content tagging.

You will want to convert the Word file to an XML file then develop
an XSLT to do the conversion to the final DTD.  You could also
use Omnimark if you access to it but XSLT developers are easier to

In order to convert the Word file to XML you have several options:

1.  Use Word 2003 Professional and save the file as WordML.  WordML
    is a bit of a learning curve and is not user friendly.
2.  Use OpenOffice to convert the file to WordML or DocBook.  The
    version of OpenOffice that I use had some problems with tables
    but otherwise did a reasonable job.
3.  Buy InfinityLoop (I think it is under $100 but don't quote me
    on that).  InfinityLoop does a really nice job of converting
    Word files.  It also has a batch option.

Once you have the file in an XML format you can expect to get 75-85%
reliance on conversion using either XSLT or Omnimark (depending on
the quality of the script, of course).  Make sure to average in time
for cleanup and QA.

Hope this helps.


On Thu, 18 Aug 2005, Davis, Joe wrote:

> Good morning.
> I've been nominated to take some legacy technical manuals written in
> Word and Word Perfect and convert them into an SGML/XML format.  The
> manuals are a combination of text, table, and graphics.   The required
> DTDs, etc. should be supplied to us.  We've played with Word which will
> convert to HTML, such as it is.  The minor research that I've done does
> not explain how to convert a manual over.  
> Without knowing what I'm doing, it appears as if each heading,
> paragraph, table/cell/row, graphic, and foldouts will need to be given
> individual tags.
> Where is a good source for information on conversion?
> Is there a program that will make my life easier (is there really an
> easy button?)
> Any help will be gratefully received.
> Thanks,
> Joe

Betty Harvey                         | Phone: 410-787-9200 FAX: 9830
Electronic Commerce Connection, Inc. |
harvey@eccnet.com                    | Washington,DC XML Users Grp
URL:  http://www.eccnet.com          | http://www.eccnet.com/xmlug


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS