xml-dev - Documents, data and markup: YAML Ain't Markup Language

Documents, data and markup: YAML Ain't Markup Language

[ Lists Home | Date Index | Thread Index ]

To: Dare Obasanjo <dareo@microsoft.com>, xml-dev@lists.xml.org
Subject: Documents, data and markup: YAML Ain't Markup Language
From: Paul Prescod <paul@prescod.net>
Date: Fri, 06 Jun 2003 13:19:02 -0700
In-reply-to: <B885BEDCB3664E4AB1C72F1D85CB29F80648444F@RED-MSG-10.redmond.corp.microsoft.com>
References: <B885BEDCB3664E4AB1C72F1D85CB29F80648444F@RED-MSG-10.redmond.corp.microsoft.com>
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.3a) Gecko/20021212

As Eric said, mixed content is a big one.

In document applications, order tends to matter by default.

In data applications, order tends not to matter except in specialized 
list contexts.

Name/value pairs are probably the most convenient "fundamental data 
type". In documents, lists of elements tend to be. It is only because 
documents tend not to make heavy use of name/value pairs that XML can 
get away with such a weak notion of attributes (which, ironically, 
data-heads are often agitating to remove!)

Because of the name/value orientation of data applications, it is 
usually safe to ignore an unknown element as an "extension". But in a 
document application unknown elements tend to have semantics that you 
really should deal with. A publisher can't say "I've never heard of a 
colophon, therefore I'll just throw it out."

Data-oriented applications tend to want to map XML elements to objects 
(thus the emphasis on name/value pairs). Document-oriented applications 
tend to use a stream processing or visitor model.

Data-oriented systems tend to distinguish between roles 
(fields/properties/attributes) and types. Documents tend to mix them all 
together (is "title" a role or a type of thing?).

Data-oriented systems tend to prefer object types to be detectable 
independent of context (thus namespaces) whereas document processing is 
typically done top-down recursively so relying on context is natural.

I am good friends with one of the inventors of YAML and I don't argue 
with him when he says that YAML is better for most data-oriented 
applications. I think he's probably right. But as somebody else said, 
what would be the cost in toolset complexity of having to master two 
different languages.

If one could go back in time, one could approach the problem from 
scratch with the needs of document and data heads equally represented. 
It would not just be useful to combine them so we could reuse tools. It 
would be useful to combine them because most documents have a 
data-oriented subset (if only the "metadata" element at the top) and 
many data applications have a document-oriented subset (if only rich 
text fields). Another reason to combine them is that there is no clear 
boundary. There is a spectrum.

But I'm sorry to say that that is not the way XML is.

And by the way, if you consider RDF:

  * triples are roughly equivalent to name/value pairs (the third item 
in the triple is the "parent" object)
  * order does not matter by default
  * types and roles are distinguished
  * types and roles are context-free
  * triples with unknown predicates are easily ignored

IMHO, is precisely the impedence mismatch between the data view of the 
world and XML that makes RDF look so ugly. As a data model, RDF is not 
far from ideal for most of the data-oriented applications I've done.

I think that having a clean strategy for merging the two worlds is one 
of the big open questions in the XML world.

  Paul Prescod

Follow-Ups:
- Re: [xml-dev] Documents, data and markup: YAML Ain't Markup Language
  - From: John Cowan <jcowan@reutershealth.com>

References:
- RE: [xml-dev] YAML Ain't Markup Language
  - From: "Dare Obasanjo" <dareo@microsoft.com>

Prev by Date: Re: [xml-dev] YAML Ain't Markup Language
Next by Date: Re: [xml-dev] YAML Ain't Markup Language
Previous by thread: Re: [xml-dev] YAML Ain't Markup Language
Next by thread: Re: [xml-dev] Documents, data and markup: YAML Ain't Markup Language
Index(es):
- Date
- Thread