OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Word and XML (was: XML standards coherency and so forth)

[ Lists Home | Date Index | Thread Index ]
  • From: <david@megginson.com>
  • To: xml-dev@ic.ac.uk
  • Date: Thu, 21 Jan 1999 20:01:22 -0500 (EST)

Sean Mc Grath writes:

 > RTF doesn't map well to XML -- even very low level -- formatting
 > oriented XML -- because of the way RTF is structured.
 > It is stack based and allows structures to overlap:-
 > 	\b1 bold \i1 bold italic \b0 italic \i0 plain
 > Matching up the on/offs:-
 > 	<b> bold <i> bold italic </b> italic </i> plain
 > invalid XML (or indeed SGML) because of the overlaps.

This is actually quite simple to handle algorithmically by maintaining
a stack and doing a pushback when tags aren't nested:

RTF   Tags        Stack
\b1   <b>         (b)
\i1   <i>         (b i)
\b0   </i></b><i> (i)
\i0   </i>        ()

You'd need only four or five lines of code to handle it -- just walk
back on the stack to the nearest matching state (closing all open
tags), then reopen everything except what you just closed.  I'm not
saying that you'll always get valid HTML, but at least the tags will
be properly nested.

All the best,


David Megginson                 david@megginson.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS