OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Its the semantics dummy :-)

[ Lists Home | Date Index | Thread Index ]
  • From: tpassin@home.com
  • To: xml-dev@lists.xml.org
  • Date: Mon, 31 Jul 2000 21:38:41 -0400

Jonathan Borden wrote, in the continuing saga -

> Simon St. Laurent wrote:
> [Jonathan Borden]:
> > >It is always
> > >possible to send a perfectly well formed XML document that is totally
> > >useless e.g.
> > ><doc> <byte>67</byte> <byte>121</byte>... </doc>
> >
> > Uselessness is in the eye of the beholder, and MIME only takes you to
> > doucment container, not to its contents.  67 121 might be a very
> > code to me.
> The point being that this document is no less difficult to understand than
> pure binary document consisting of byte after byte. The fact that one
> document is XML or the other document can be sent using base64 encoding
> text based SMTP makes no substantial difference (and one can quite easily
> convert between the two formats).
> >
Actually, there is quite a difference between this XML document and a pure
binary one.  In the binary one you have no idea about the structure - you
don't even know if it's really supposed to be a series of bytes of a series
of unicode points or what.  In the XML document, you know that the data
items are 67, 121, ....  True, you don't know if they are string or
integers, but that might not matter anyway.

And  this isn't necessarily trivial.  I once wrote a program that ingested
time series data, did all kinds of processing on it, and plotted the
results.  I wrote it to ingest one or two column ASCII text, to discover the
number of columns, to discover which lines were comment lines (if it didn't
start with a valid number, it was a comment), and even to learn whether the
input was really a stream of binary bytes.  Worked great.  But I couldn't
discover if I was trying to read a file with binary 16-bit numbers.  I could
imagine how to do it, but it was too hard to justify the effort and wouldn't
have been foolproof anyway.

If I could have inspected the files to find out the storage units, it would
have been great.

These marked-up documents are called "self-describing" not really because of
semantics, but because the structural units are self-delineated.


Tom Passin


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS