OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] WORD TO XML/SGML

[ Lists Home | Date Index | Thread Index ]


I've been developing tools and techniques to do this type of conversion for the last couple of years [*].† The latest manifestation of this work is in the DocBook XSL stylesheet repository (http://docbook.sf.net/) - a system for "roundtripping" DocBook via Word.† That is, there are XSL stylesheets for converting WordML (Office 2003's XML format) into DocBook, and stylesheets for the reverse.† One half of that is what you're looking for.† See also†http://www.ausweb.scu.edu.au/aw04/papers/refereed/ball and†http://ausweb.scu.edu.au/aw05/papers/edited/ball/poster.html

The techniques used are not specific to WordML, nor are they specific to DocBook.† I have developed stylesheets for my clients that target XML schemas/DTDs other than DocBook.

There's no single button, but it is achievable.† A big constraint is that your Word documents must be marked-up using styles, or at least "regular" or consistent in some sense.

Contact me if you'd like further information.

[*] Word->XML converters have been around for even longer, but the introduction of Word 2003 has made the process much more robust - not actually much easier, but more reliable.† There are some commercial products around that may help - DocSoft is one example, there are others.

Steve Ball


Steve Ball† † † † † † | XSLT Standard Library | Training & Seminars

Explain † † † † | † † Web Tcl Complete† † † | XML XSL Schemas

http://www.explain.com.au/ |† † † TclXML TclDOM† † † † | Tcl, Web Development


Ph. +61 2 6242 4099 | Mobile (0413) 594 462 | Fax +61 2 6242 4099

On 19/08/2005, at 2:54 AM, Davis, Joe wrote:

Good morning.

Iíve been nominated to take some legacy technical manuals written in Word and Word Perfect and convert them into an SGML/XML format. †The manuals are a combination of text, table, and graphics.† †The required DTDs, etc. should be supplied to us.† Weíve played with Word which will convert to HTML, such as it is. †The minor research that Iíve done does not explain how to convert a manual over. †

Without knowing what Iím doing, it appears as if each heading, paragraph, table/cell/row, graphic, and foldouts will need to be given individual tags.

Where is a good source for information on conversion?

Is there a program that will make my life easier (is there really an easy button?)

Any help will be gratefully received.




News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS