OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] heritage (was Re: [xml-dev] SGML on the Web)

[ Lists Home | Date Index | Thread Index ]

Hi Tom,

> I do not see how you create documents with multiple sets of markup
> and be sure that any one set is valid against a schema (save by
> preprocessing it and then validating, but I am thinking about during
> the authoring process)

I agree that's an interesting problem. The way we're planning on
handling this in LMNL is to explicitly keep separate (in the data
model, that is, not the syntax) my markup and your markup of the same
document, keeping them in different *layers*. This means you can focus
validation on your particular layer while ignoring the other layers,
wherever they might come from.

But we don't think that solves the problem of having overlapping
ranges. Overlapping markup doesn't just come about because you have
overlapping trees, it also comes about because in some cases the most
natural way of marking up text is with overlapping structure.

For example, in the classic:

  [b}bold, [i}bold italic,{b] italic{i]

to the user, this conceptually makes sense. "bold, bold italic,"
should be in bold, and "bold italic, italic" should be in italic.

I think that these overlaps mostly happen when the inferences licensed
by the markup is distributed, to use the terminology used by
Sperberg-McQueen et al. [1]. In other words, the markup above is
assigning properties to individual characters within the string; the
meaning would be exactly the same the markup were distributed

  [b}bold, {b][i}[b}bold italic,{b]{i][i} italic{i]

which is why this isn't a problem in tree-based document models.
But overlapping also occurs when, for example, people add comments to
some text:

  [comment=jt1 [text}This should read...{]}This document
  [comment=wp3 [text}It does more than that...{]}attempts{=jt1]
  to describe{=wp3]...

  (This example demonstrates overlapping ranges paired by IDs.)

I don't think that these overlaps are solved either by splitting into
multiple hierarchies (you'd end up with something like as many
hierarchies as you had comments) nor by rearranging the markup to give
a nice tree structure, since the comment is about the *whole* text,
not about the individual characters.

So anyway, how do we validate it? Well, this is still work in
progress, but we've been discussing using a RELAX NG-based schema
language that can describe this overlap. For example, a schema for a
layer that contains multiple overlapping comment ranges might look

start = overlap { comment* }

comment = range comment [ text ]
                        { annotation text [ text ] { empty },
                          (comment | comment.start | comment.end)* }

comment.start = start range comment
comment.end   = end range comment

We'd really welcome comments and suggestions about the whole
validation question... on LMNL-Dev (http://www.lmnl.org/list) :)



[1] http://www.idealliance.org/papers/extreme02/html/2002/CMSMcQ01/EML2002CMSMcQ01.html

Jeni Tennison


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS