OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   [Fwd: Re: [xml-dev] Documents, data and markup: YAML Ain't MarkupLanguag

[ Lists Home | Date Index | Thread Index ]
  • To: "'xml-dev'" <xml-dev@lists.xml.org>, cce@clarkevans.com
  • Subject: [Fwd: Re: [xml-dev] Documents, data and markup: YAML Ain't MarkupLanguage]
  • From: Paul Prescod <paul@prescod.net>
  • Date: Mon, 09 Jun 2003 16:37:38 -0700
  • User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.3a) Gecko/20021212

Forwarded for Clark...

-------- Original Message --------
Subject: Re: [xml-dev] Documents, data and markup: YAML Ain't Markup 
Date: Mon, 9 Jun 2003 19:28:55 +0000
From: Clark C. Evans <cce@clarkevans.com>
To: Paul Prescod <paul@prescod.net>
CC: Dare Obasanjo <dareo@microsoft.com>, xml-dev@lists.xml.org

On Fri, Jun 06, 2003 at 01:19:02PM -0700, Paul Prescod wrote:
| As Eric said, mixed content is a big one.

Indeed.  I would say that mixed content is *the* line between
narrative (document processing) and operational (data processing)
information.   Mixed content is at the core of XML.  You can
choose not to use it, however, you always pay for its complexity.

I wanted a serialization language that didn't pay the high price
of attributes, mixed content, element tag vs content and other
items necessary for document processing.


The primary distinction between XML and YAML is the information
model.   YAML has two models, a graph and a serial model.  The
graph model assumes a random-access mechanism where nodes are
functions (maps or lists) and scalars.  In the serial model, this
graph is flattened by marking the first occurance of a node,
and then signifying subsequent occurances.   In both models
every node has a type, the default type being string, mapping,
or list.   In effect, in YAML ballences needs of a computerized
random-access environment with human sequential reading needs.

YAML specifically ignores document processing requirements.  Over
time I've come to use both XML and YAML, leveraging the strengths
of each where they best fit my problems.   In particular, their
drastically different syntax lets you blend both of them together
in the same file!  I do this frequently.

YAML has a significantly different model from XML:

   - XML distinguishes between 'tags' and 'content', in YAML
     mapping keys are scalars just like mapping values or
     list entries.   Thus, XML has a deep syntaxtual distinction
     between 'meta-data' and 'data'.   YAML avoids this distinction.

   - XML elements have attributes, or key/value pairs which serves
     as a mapping.  YAML has a mapping, but unlike attributes, both
     the key and value can be structured.

   - The XML model is a tree, YAML is a graph.   In YAML syntax
     there are 'anchors' and 'aliases', but these are features of
     the syntax necessary to flatten the graph.

   - XML has namespaces, YAML nodes can have a type specifier.
     They are similar, but quite different as 'namespace' really
     does not exist in YAML land, only types.  Of course, someone
     is free to interpret sub-strings of a type specifier how
     ever they wish.

   - In the XML information model, syntax is king.  In YAML, we
     have two models beacuse both humans and machines are king
     at the same time.  Albeit machines are a bit more kingish.

   - The top production of XML is a single document node; the
     top production in YAML is a sequence of nodes.

This dual model creates a few 'inconsitencies' which are easy
to explain; certain elements of the serial model just are not
in the graph model.   The most troublesome is key order.  Human
readers require specific key ordering for their data processing;
and some sequential processing applications need keys to be sent
in a particular order.   The solution here is to augment the
'graph' model with a 'style-sheet' which aids in the translation
from a graph to a serialized textual form.  Therefore,

   - XML requires a schema to extract data from the syntax,
     YAML requires a schema to serialize data to the syntax.

Now, one *could* use YAML to express a document, however, the author
would have to pay the price of keeping everything 'functional', that
is, thinking only in terms of sequences and mappings.   It is not
pretty... I've tried it.   Indeed our spec is written in a YAML
language for documents, but as I remember it drops down to HTML
for use in paragraphs.

That said, as much as you can argue that XML is good for data
serialization, I can argue that YAML is good for document processing.
XML is butt ugly for data processing.  YAML is butt ugly for
document processing.    And I do not think you can argue your
way out of this.   XML was designed up-front to be a document
processing mechanism.   There is no way to eliminate that legacy.

| In document applications, order tends to matter by default.
| In data applications, order tends not to matter except in
| specialized list contexts.

In data applications, I'd say that the structures fall evenly
down the mapping vs sequence.  The sequence is not really a
'specialized' context, it is more of a general rule.

| Name/value pairs are probably the most convenient "fundamental data type".

The fundamental data type is the function.  Both mappings and lists
are functions.   IMHO, it is really mixed content which is the pivot
point, that and having ordered keys where duplicates are allowed.

| In documents, lists of elements tend to be. It is only because
| documents tend not to make heavy use of name/value pairs that XML can
| get away with such a weak notion of attributes (which, ironically,
| data-heads are often agitating to remove!)

Not really ironic.  The attributes do not allow for recursion, and
thus are not very useful in a data context.  ;)

| I am good friends with one of the inventors of YAML and I don't argue
| with him when he says that YAML is better for most data-oriented
| applications. I think he's probably right. But as somebody else said,
| what would be the cost in toolset complexity of having to master two
| different languages.

Not that much.  If anyone can master XML, they could master YAML in a
faction of the time.   Mostly beacuse YAML hasn't the toolset that XML
has.  However, the toolset will emerge, it just may take a few years.

Also, YAML was really designed from the knowlege of XML, and thus
lessons hard won by the XML community could be used by YAML without
the legacy.   Indeed, YAML owes much of its history to XML via the
SML-DEV list and our dissident analysis.

| If one could go back in time, one could approach the problem from
| scratch with the needs of document and data heads equally represented.
| It would not just be useful to combine them so we could reuse tools. It
| would be useful to combine them because most documents have a
| data-oriented subset (if only the "metadata" element at the top) and
| many data applications have a document-oriented subset (if only rich
| text fields). Another reason to combine them is that there is no clear
| boundary. There is a spectrum.

Yes!  Much of my data now mixes the two.  ;)

| But I'm sorry to say that that is not the way XML is.
| And by the way, if you consider RDF:
|  * triples are roughly equivalent to name/value pairs (the third item
| in the triple is the "parent" object)
|  * order does not matter by default
|  * types and roles are distinguished
|  * types and roles are context-free
|  * triples with unknown predicates are easily ignored
| IMHO, is precisely the impedence mismatch between the data view of the
| world and XML that makes RDF look so ugly. As a data model, RDF is not
| far from ideal for most of the data-oriented applications I've done.
| I think that having a clean strategy for merging the two worlds is one
| of the big open questions in the XML world.

Thanks Paul.  This was very insightful.




News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS