xml-dev - Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)

Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)

[ Lists Home | Date Index | Thread Index ]

To: John Cowan <jcowan@reutershealth.com>
Subject: Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
From: Jeni Tennison <jeni@jenitennison.com>
Date: Tue, 8 Oct 2002 12:07:33 +0100
Cc: xml-dev@lists.xml.org (XML DEV)
In-reply-to: <200210081100.HAA27650@mail2.reutershealth.com>
Organization: Jeni Tennison Consulting Ltd
References: <200210081100.HAA27650@mail2.reutershealth.com>
Reply-to: Jeni Tennison <jeni@jenitennison.com>

Hi John,

>> In other words, the markup above is assigning properties to
>> individual characters within the string; the meaning would be
>> exactly the same the markup were distributed differently:
>> 
>>   [b}bold, {b][i}[b}bold italic,{b]{i][i} italic{i]
>
> I think the tutorial should make a point of this, that although
> <b><i>foo</i></b> and <i><b>foo</b></i> are different in XML, their
> LMNL analogues mean exactly the same thing.

Ah well, the thing is that they don't, exactly, and it all comes back
to this reified LMNL layer thing again.

When Wendell and I started developing LMNL (and when we presented at
Extreme), they did mean *exactly* the same thing. When we got together
with Gavin and started discussing the implications of this a bit more,
we discovered that having them mean exactly the same thing led to some
nasty complications. In particular, if you consider something like:

  [a [href}page.html{]}[img [src}image.gif{]]{a]

if you take the view that the tags are just indicating points in the
character stream, then this LMNL is *exactly* the same as:

  [a [href}page.html{]}[img [src}image.gif{]}{a]{img]
  [img [src}image.gif{]}[a [href}page.html{]}{img]{a]
  [img [src}image.gif{]}[a [href}page.html{]]{img]

In other words, you lose information about the relationships that the
author intends the ranges to have with each other. The author intended
the link to be around the image, but the LMNL processor could just as
easily interpret it as being a link and an image overlapping, or an
image "around" a link.

Gavin was really helpful because he explained the whole concept of
layers being used as a way of extracting structure from text
documents, and how you could apply layers to LMNL syntax documents in
order to preserve this information while retaining the purity of the
basic LMNL data model.

So we introduced the concept of the reified LMNL layer. In the reified
LMNL layer, various syntactic aspects of the actual LMNL syntax (or
XML syntax) that you use are retained, including:

  - how the ranges are nested
  - whether you put an annotation in the start or end tag
  - what comments there are in the text
  - where entities have been used
  - which prefixes were used on qualified names
  - what names you assigned to layers

The reified LMNL layer acts as a half-way house between the syntactic
information that's important to the author, and the "pure" LMNL data
model of the document.

The intention is that applications such as schema languages or
transformation languages will be able to run over either the "pure"
LMNL data model or a reified LMNL layer. If they run over a reified
LMNL layer, they'll be able to preserve various aspects of author
intent that don't make it into the "pure" LMNL data model.

I guess it's a way of having our cake and eating it too.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/

References:
- Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
  - From: John Cowan <jcowan@reutershealth.com>

Prev by Date: Re: [xml-dev] Using RDDL as a Distributed Registry Architecture
Next by Date: Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
Previous by thread: Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)
Next by thread: RE: [xml-dev] SGML on the Web
Index(es):
- Date
- Thread