Lists Home |
Date Index |
As Eric said, mixed content is a big one.
In document applications, order tends to matter by default.
In data applications, order tends not to matter except in specialized
Name/value pairs are probably the most convenient "fundamental data
type". In documents, lists of elements tend to be. It is only because
documents tend not to make heavy use of name/value pairs that XML can
get away with such a weak notion of attributes (which, ironically,
data-heads are often agitating to remove!)
Because of the name/value orientation of data applications, it is
usually safe to ignore an unknown element as an "extension". But in a
document application unknown elements tend to have semantics that you
really should deal with. A publisher can't say "I've never heard of a
colophon, therefore I'll just throw it out."
Data-oriented applications tend to want to map XML elements to objects
(thus the emphasis on name/value pairs). Document-oriented applications
tend to use a stream processing or visitor model.
Data-oriented systems tend to distinguish between roles
(fields/properties/attributes) and types. Documents tend to mix them all
together (is "title" a role or a type of thing?).
Data-oriented systems tend to prefer object types to be detectable
independent of context (thus namespaces) whereas document processing is
typically done top-down recursively so relying on context is natural.
I am good friends with one of the inventors of YAML and I don't argue
with him when he says that YAML is better for most data-oriented
applications. I think he's probably right. But as somebody else said,
what would be the cost in toolset complexity of having to master two
If one could go back in time, one could approach the problem from
scratch with the needs of document and data heads equally represented.
It would not just be useful to combine them so we could reuse tools. It
would be useful to combine them because most documents have a
data-oriented subset (if only the "metadata" element at the top) and
many data applications have a document-oriented subset (if only rich
text fields). Another reason to combine them is that there is no clear
boundary. There is a spectrum.
But I'm sorry to say that that is not the way XML is.
And by the way, if you consider RDF:
* triples are roughly equivalent to name/value pairs (the third item
in the triple is the "parent" object)
* order does not matter by default
* types and roles are distinguished
* types and roles are context-free
* triples with unknown predicates are easily ignored
IMHO, is precisely the impedence mismatch between the data view of the
world and XML that makes RDF look so ugly. As a data model, RDF is not
far from ideal for most of the data-oriented applications I've done.
I think that having a clean strategy for merging the two worlds is one
of the big open questions in the XML world.