Lists Home |
Date Index |
> Not all of the benefits of XML derive from its basis in text. Some of
> the benefits derive from its paranoia. Everything is checked every time.
> If a process is generating bad data whether through malice,
> incompetence, bugs, line noise, spec misinterpretation, disk corruption,
> cosmic rays, or a dozen other reasons, we find out very quickly.
On the other hand, to some extent the redundancy in XML is required
precisely because it is a text format: programmatically generated
data can have unit tests to detect ill-formedness, but as soon as
you can type and cut-and-paste your documents it becomes more likely
than not that there will be an WF error. Furthermore, some binary formats
that use indexes or trees are not susceptible to the class of errors
caused by a missing end-tag, for example.
> Binary formats are no more fundamentally resistant to corruption than
> text based formats are. Indeed the ones being proposed are less
> resistant because they are compressed and therefore less redundant.
> While error correction can certainly be added to binary formats (CDs do
> this, for example) I've yet to notice anyone proposing this for NOT XML.
But the redundancy in XML qua text, to the extent that it exists, is not
enough to allow error correction either.
Indeed, in XML 1.0 the unavailable code points are only enough to detect
some WF problems: XML 1.1 is more systematic and preferable there, at the
minor cost that people who have done stupid and dangerous things like
using the non-whitespace C0 or C1 code points in XML 1.0 have their
follies exposed by their only indirect availability in XML 1.1. Actually,
I am not sure it is logically consistent to praise XML for its use of
redudancy without being impelled by the same argument to favour XML 1.1,
which is objectively or systematically better in this regard.
Also, I don't understand the point about text-based formats being more
resistant to corruption because of redundancy. For example, when there is
some hardware fault that inject errors at random, the smaller the format
the fewer errors (absolute #).
> The goal of NOT XML seems to be size and speed at all costs, including
> the cost of transparency and disaster recovery. At least with real XML,
> when something goes horribly wrong with critical data, a human can
> probably fix the mistakes and recover most of the information. With a
> binary format, that's going to be much harder to do, if it's even
There also is a
aspect too. One reason XML Schemas is type-based is because that was
expected to allow certain efficiencies; standards or technologies which
can actually make these marvelous efficiencies materialize (XQuery, Binary
XML, type-based linking) need to descend from heaven at regular intervals
to improve XML Schemas' bang-per-buck. "Build the text field and they will
come!" If standards to use the PSVI or schemas are not forthcoming or
don't work, then the complexity or poor fit of XML Schemas will not be as
excusable (as I hope it will be at some time in the future). The big
players need to leverage the PSVI before it leverages them, IYKWIM.
For a blog on Fast Infoset, see
For Fast Infoset, why not see it as ASN.1 becoming more XML-infrastructure
compatible rather than XML going binary?
I honestly don't see what is wrong with well-thought-out alternative
approaches to the same problem, in particular where they have very
different characteristics. Plurality is healthy. Protesting that an
Infoset-carrying binary format will have different properties than XML is
rather the point of the exercise: a format with exactly the same
properties would be a futile competitor rather than a complement. The
people who want binary infosets may well be bit-crazed losers who don't
understand their problems and want to stuff up our world as well, but then
again they may not: the most charitable thing is not to be a nanny but to
say "Spread your wings and fly my eaglet child" or "Go hang yourselves":--
let them make a binary infoset standard and see in practise what solutions
it is good for. There is a strong streak of puritanism (Thou shalt only
have one way to do anything) that is counter-productive.
All engineering involves measuring and understanding the characteristics
of a technique or material, to allow repeated projects with known
performance characteristics; what is important isn't that the world
contains only perfect technologies, but that we know what their strengths
and weaknesses are, when to use them, and how to influence our local
standard's bodies in positive directions to get broad and pragmatic
coverage of our different use cases.
The Binary Infoset issue is small compared to the larger one that has
crippled fundamental standards at the W3C: the chaotic development of
DTD-replacing layers (xml:include, xml:base, xml:id, xlink, XML Schemas)
without having a corresponding dependable processing sequence like that of