OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Re: Divorcing Data Model and Syntax: was Re: [xml-dev] her

[ Lists Home | Date Index | Thread Index ]


Jeni Tennison wrote:

>Hi Patrick,
>>>You're wrong on that last point. A LMNL processor isn't actually
>>>defined anywhere, but I'd say that it was anything that generates a
>>>LMNL data model. There are no restrictions on what the *source* of
>>>that LMNL data model could be -- XML, LMNL, TexMECS, plain text, CVS
>>>files etc. etc. etc.
>>Let me see if I can get a little closer by changing your words and
>>see if you think I am saying the same thing:
>>A LMNL processor generates the LMNL data model (ranges and
>>annotations) based upon places in the data (however found, imposed
>>or represented) and data associated with those places?
>A LMNL processor generates a LMNL data model (layers, ranges and
>annotations) in whatever way it likes. I'm not sure what you mean
>about places in the data and data associated with those places, but if
>I rephrase to:
>  "Most LMNL processors will build a LMNL data model by taking a
>   sequence of characters (a string) and deriving a structure from
>   that string. This structure will usually be based on the presence
>   of 'markup' within the string, whether explicit (such as XML tags)
>   or implicit (such as spaces between words). The LMNL processor may
>   also associate extra information with particular pieces of the
>   string."
Ultimately, whether my informal statement or your more formal one, the 
LMNL processor converts a serialization syntax based upon a data model 
into the LMNL data model. Is that a correct statement?


>>So, correcting my earlier statement, the LMNL view of data is
>>limited to the LMNL data model? (I realize you do not agree that is
>>a limitation or not much of one.)
>Yes. The reason that I don't think that this is a limitation is that
>any particular layer within the LMNL data model can hold ranges and
>annotations that represent any other kind of data model that can be
>derived from a text document.
Well, that is the rub isn't it? A string based data model cannot 
represent anything that is not a string. Divorcing the data model and 
serialization syntax is not limited to application to things that can be 
represented as strings. Anything that can be addressed from or by a 
serialization syntax can have a different data model imposed on it at 
the time of processing. Our examples, to be sure, have been texts, after 
all we (Matt & I) are both involved in biblical studies so it is what 
you would expect. ;-) Little demand for galactic coordinate spaces and 
the like in biblical studies. ;-)


>>On the other hand, JITTs does not have a data model. It imposes
>>whatever data model (in your sense of the term) without regard to
>>the how the places in the data are represented in a particular
>>"serialization syntax." In other words, I could impose the XML data
>>model on a Postscript file, or vice versa, but either would require
>>careful attention to the requirements of the output "serialization
>>You can say that JITTs represents a divorce between any given
>>serialization syntax and a particular data model. Yes, I like that.
>Right. In the LMNL approach to the same problem, we might take a
>sequence of characters (a text layer) and derive ranges over those
>characters to create a syntactic layer, and then derive ranges over
>those ranges to create a higher-level layer that represents a
>particular data model, for example an XML Infoset layer.
It is the step to the creation of the representation you describe that 
leaves me quite curious. Why? Unless there is something I cannot 
represent in the original serialization syntax that is an issue, why 
even go there?

Take the example of the dictionary I offered over the weekend. (For the 
benefit of those who missed that discussion I repeat the example.)

***repeat of example***

       (typical OED entry back to early Sumerian usage)

Now, you want to build a DOM tree. With the standard XML tree, you get 
everything between the <entry></entry> as nodes in the DOM tree. Correct?

As an alternative, for a "lite" searching interface to a dictionary, you 
only want: <entry><headWord>JITTs</headWord>(blob of unparsed PCDATA, 
which includes all the markup you got as nodes in the DOM tree in the 
first one)</entry>

When I find the word I want, in this case JITTs, that block is returned 
but this time, a tree is asserted for all the markup in the "blob,"  and 
processed for presentation.

***end repeat of example***

Now as I understand your explanation, everything between <entry> to 
</entry> would be converted into the LMNL data model?

In other words, does (or does not) the LMNL data model recognize all the 
ranges in XML text being read into the LMNL data model?

If all I need do is avoid interpreting markup that will simply bulk up 
my DOM tree, why do I need LMNL? All the markup is still present should 
I assert another tree, this one selecting only the entry I want, and I 
process all the markup found just like any other XML fragment.



Patrick Durusau
Director of Research and Development
Society of Biblical Literature


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS