[
Lists Home |
Date Index |
Thread Index
]
Murali Mani wrote:
> can we be sure Microsoft will not define a custom binary
> format for storing XML documents from Office?
The easiest way to encourage them not to define a *custom*
binary format is to make it easy for them to use a *standard* binary
format that interchanges well with code designed to process XML. Of
course, what I'm suggesting is that by supporting the use of ASN.1
defined encodings we can get binary/XML interchange and preserve our
investment in tools like SAX, DOM, etc. This is because the ASN.1
defined encodings (BER, PER, etc.) provide *lossless* encoding of XML
data in highly compact binary forms. (I wouldn't be surprised, for
instance, if a typical word documents ended up being less than 10% of
the original size if encoded in PER...)
I must say that I'm very concerned about the impact that Word
documents in XML are going to have on the whole "XML movement." While
people have grumbled about XML's size for a long time, most people
haven't been exposed to the elephantine monsters that result when you
convert Word to XML. ("The Word document that ate my disk...") This is
going to start a whole series of companies and projects that will
"address the problem of XML storage compression."... Unfortunately,
most of those projects will be describing XML as a bug that needs to
be fixed. The "bad press" for XML will not be pleasant and may give
folk like Microsoft the cover they need to define "more efficient"
*custom* formats in the future. I personally feel that it is vital to
the "XML movement" that we be aggressive and accept existing,
standard, ASN.1 defined encodings as binary peers of XML in order to
address the storage space issue immediately.
Microsoft has defined their schema using WXS. However, since
the mapping from WXS to ASN.1 is defined (X.694), that means that by
releasing they WXS, they have defined more than the XML schema for
Word -- they have also defined the ASN.1 schema for Word. At this
point, providing compressed binary support for Word documents is a
trivial matter.
There is actually an interesting opportunity here... If one or
more of the recent "open office" products were to provide
XML-compatible ASN.1 encoding support, then they would be able to
argue that they have all of the benefits of the XML encodings that
Microsoft supports while also addressing the needs of customers by
providing highly compressed binary encodings. So far, all the
Office-alternatives, have been limited to arguing that they are "just
as good" as Microsoft's Office. If they were to embrase compact binary
encodings that are XML-compatible, then they would be able to argue
that they are "better."
In summary: If you want to avoid *custom* binary formats,
ensure that *standard* binary formats are available and supported.
bob wyman
|