OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] NY Times reference to 'secret coding'

On Tue, 4 Sep 2007 19:18:29 -0400, noah_mendelsohn@us.ibm.com wrote:

>There are often at least two levels of concern when considering 
>compatibility of an office-style file format:  1) given the published 
>specifications and an arbitray document instnace, can you extract the 
>general semantics -- for example, can you tell that a given page is in two 
>column format with a footnote at the bottom vs. 2) can you tell to the 
>exact pixel how that document is to be printed on some particular printer, 
>and can you predict for that printer exactly how long the footnote can be 
>before it wraps to a second page?

Well put.  IMO, the second point is more a consideration for printer
drivers than for interchange, but there are indeed multiple levels.
For example, take the RTF spec, with which I am intimately familiar.
At first reading, it seems complete, and pretty simple to write to.
However, there are at least three kinds of problems: missing pieces, 
unexpected interactions between codes, and instability.

Missing:  The spec describes syntax for "fields", which is accurate
as far as it goes.  But since the main practical use of RTF is for
interchange with Word, you also need to know what fields are used
for what purposes, and what switches apply to them.  It's not there.  
You need to use Word to make instances and save as RTF, and then 
reverse engineer.  So you have a published spec that is ineffective
without also knowing the proprietary extensions.  That's not obvious
to someone reading the spec, but an implementor finds out fast.

Interactions:  The spec says that a control word, like \par, is 
terminated by the first whitespace or punct after it.  But that
is not how *some* work.  Which ones?  Reverse engineer some more.
Hint: you won't get footnotes working right unless you do that.
When you get to issues like how to wrap an xref in a hyperlink,
the code needed gets Byzantine, and the spec offers no guidance.

Instability:  In Word 95, a WMF graphic used units of twips (1440
per inch) for many purposes; this was built in, not settable.  Then 
in Word 97, the unit changed to himetric (2540 per inch), still
with no indication of its value.  So RTF made for Word 95 produced
shrunken graphics in Word 97.  In Word XP (2002), the unit changed
back, still with no indication in the RTF.  Sure, the spec described
it, but compatibility was broken both ways.  Similar tweaks happen
with other settings.

I could go on about the "PDF killer", XPS, but this is long enough
already...   ;-)  I'll just mention that the spec is *in* XPS, so
you need the viewer to read it, and when you go to download the free
viewer, you are asked to accept a fascinating agreement first.  I
read enough to know I would never sign it.  On a hunch, I changed
the extension to .zip, unipped, and looked at the results, also
interesting.  I sure wouldn't vote for it as a standard...

-- Jeremy H. Griffith, at Omni Systems Inc.
  <jeremy@omsys.com>  http://www.omsys.com/

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS