xml-dev - Re: Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re:

Re: Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re:

[ Lists Home | Date Index | Thread Index ]

To: Patrick Durusau <pdurusau@emory.edu>
Subject: Re: Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
From: Jeni Tennison <jeni@jenitennison.com>
Date: Mon, 7 Oct 2002 14:27:38 +0100
Cc: XML DEV <xml-dev@lists.xml.org>
In-reply-to: <3DA18017.8010507@emory.edu>
Organization: Jeni Tennison Consulting Ltd
References: <200210071106.HAA12782@mail2.reutershealth.com><3DA16ACF.5010505@emory.edu> <76938281417.20021007124417@jenitennison.com><3DA18017.8010507@emory.edu>
Reply-to: Jeni Tennison <jeni@jenitennison.com>

Hi Patrick,

>>You're wrong on that last point. A LMNL processor isn't actually
>>defined anywhere, but I'd say that it was anything that generates a
>>LMNL data model. There are no restrictions on what the *source* of
>>that LMNL data model could be -- XML, LMNL, TexMECS, plain text, CVS
>>files etc. etc. etc.
>
> Let me see if I can get a little closer by changing your words and
> see if you think I am saying the same thing:
>
> A LMNL processor generates the LMNL data model (ranges and
> annotations) based upon places in the data (however found, imposed
> or represented) and data associated with those places?

A LMNL processor generates a LMNL data model (layers, ranges and
annotations) in whatever way it likes. I'm not sure what you mean
about places in the data and data associated with those places, but if
I rephrase to:

  "Most LMNL processors will build a LMNL data model by taking a
   sequence of characters (a string) and deriving a structure from
   that string. This structure will usually be based on the presence
   of 'markup' within the string, whether explicit (such as XML tags)
   or implicit (such as spaces between words). The LMNL processor may
   also associate extra information with particular pieces of the
   string."

> I assume that this is some in memory representation or is it output
> to the "serialization syntax" in the form of a file for further
> processing?

That's an implementation issue. The LMNL data model is an abstract
description of a way of looking at textual documents. That abstract
description is realised in the LOM (an API that is designed to work
with in-memory representations of the data model), but also in SAL (an
event-based API).

In a pipelined application, where one application is on one machine,
and another application is on another machine, the LMNL data model
needs to be serialised in order to pass it from the first application
to the next. For the purposes of exchange, there is a serialisation
syntax -- LMNL syntax -- that can completely represent the data model.

When applications are all resident on the same machine, though, they
can pass around the same in-memory representation, or use SAL in order
to communicate a data model from one application to the next.

> So, correcting my earlier statement, the LMNL view of data is
> limited to the LMNL data model? (I realize you do not agree that is
> a limitation or not much of one.)

Yes. The reason that I don't think that this is a limitation is that
any particular layer within the LMNL data model can hold ranges and
annotations that represent any other kind of data model that can be
derived from a text document.

For example, you could have a XML Infoset layer in which the ranges
would represent structures that are relevant in the XML Infoset data
model -- you'd have ranges such as [info:element], [info:attribute]
and so on.

> On the other hand, JITTs does not have a data model. It imposes
> whatever data model (in your sense of the term) without regard to
> the how the places in the data are represented in a particular
> "serialization syntax." In other words, I could impose the XML data
> model on a Postscript file, or vice versa, but either would require
> careful attention to the requirements of the output "serialization
> syntax."
>
> You can say that JITTs represents a divorce between any given
> serialization syntax and a particular data model. Yes, I like that.

Right. In the LMNL approach to the same problem, we might take a
sequence of characters (a text layer) and derive ranges over those
characters to create a syntactic layer, and then derive ranges over
those ranges to create a higher-level layer that represents a
particular data model, for example an XML Infoset layer.

You could then have a process that recognises the ranges in that XML
Infoset layer and creates from them an XML Infoset. From that XML
Infoset, you can create an XML serialisation of the original data if
you wanted.

The LMNL data model of layers, ranges and annotations, provides a
framework for stepping from a serialisation to a data model. We think
that the LMNL data model is quite good, so, unsurprisingly, we've
specified what the LMNL data model would look like as a layer (that's
the reified LMNL layer). As well as creating a LMNL syntax
serialisation from the data model itself, you can create a LMNL syntax
serialisation from the reified LMNL layer.

(I need to write this up as an extension to the tutorial, which at the
moment really glosses over the whole reified LMNL layer stuff and
focuses on the LMNL syntax for creating documents.)

>>The LMNL syntax is there as a *serialisation syntax* so that LMNL
>>data models can be exchanged easily, because you can't represent
>>overlapping ranges and structured annotations in XML without
>>reifying, and reified structures are tedious to write and a whole
>>lot larger than non-reified ones.
>
> No, the data model of XML does not support these uses. In the JITTs
> world view, that has no relevance to what data model and what
> occurrences of a serialization syntax are selected (or used in data)
> to match a particular data model. If I want to output XML, I best
> use the XML data model to guide the selection and/or processing of
> whatever occurrences that need to be output in a serialization
> syntax to conform to that data model.

Absolutely. In the LMNL world, I'd phrase this as "If I want to output
XML, I best create a layer that represents the XML data model."

>>The hat is *the* coolest thing about LMNL, in my opinion (and that's
>>saying a lot, 'cos LMNL is *very* cool ;). All credit to Wendell
>>Piez for it.
>
> Wendell is the origin of the hat??? ;-) I knew he was talented in
> markup, literature and a variety of other subjects but not artistic
> expression! Cheers for Wendell!

Yes indeed -- with a little help from SVG, of course! I think that he
should be credited for the realisation that a Jester's hat would be
the perfect logo for LMNL, as well as the obvious artistic merit of
the final design.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/

Follow-Ups:
- Re: [xml-dev] Re: Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
  - From: Patrick Durusau <pdurusau@emory.edu>
- Re: [xml-dev] Re: Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
  - From: John Cowan <jcowan@reutershealth.com>

References:
- Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
  - From: Patrick Durusau <pdurusau@emory.edu>
- Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
  - From: Jeni Tennison <jeni@jenitennison.com>
- Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
  - From: Patrick Durusau <pdurusau@emory.edu>

Prev by Date: Re: [xml-dev] XPath for Infoset extensions [was Annotations in XPath-NG?]
Next by Date: XML Messaging in Multicasting
Previous by thread: Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
Next by thread: Re: [xml-dev] Re: Divorcing Data Model and Syntax: was Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
Index(es):
- Date
- Thread