[
Lists Home |
Date Index |
Thread Index
]
Jeni,
Jeni Tennison wrote:
>Hi Patrick,
>
>>>You're wrong on that last point. A LMNL processor isn't actually
>>>defined anywhere, but I'd say that it was anything that generates a
>>>LMNL data model. There are no restrictions on what the *source* of
>>>that LMNL data model could be -- XML, LMNL, TexMECS, plain text, CVS
>>>files etc. etc. etc.
>>>
>>Let me see if I can get a little closer by changing your words and
>>see if you think I am saying the same thing:
>>
>>A LMNL processor generates the LMNL data model (ranges and
>>annotations) based upon places in the data (however found, imposed
>>or represented) and data associated with those places?
>>
>
>A LMNL processor generates a LMNL data model (layers, ranges and
>annotations) in whatever way it likes. I'm not sure what you mean
>about places in the data and data associated with those places, but if
>I rephrase to:
>
> "Most LMNL processors will build a LMNL data model by taking a
> sequence of characters (a string) and deriving a structure from
> that string. This structure will usually be based on the presence
> of 'markup' within the string, whether explicit (such as XML tags)
> or implicit (such as spaces between words). The LMNL processor may
> also associate extra information with particular pieces of the
> string."
>
Ultimately, whether my informal statement or your more formal one, the
LMNL processor converts a serialization syntax based upon a data model
into the LMNL data model. Is that a correct statement?
<snip>
>>So, correcting my earlier statement, the LMNL view of data is
>>limited to the LMNL data model? (I realize you do not agree that is
>>a limitation or not much of one.)
>>
>
>Yes. The reason that I don't think that this is a limitation is that
>any particular layer within the LMNL data model can hold ranges and
>annotations that represent any other kind of data model that can be
>derived from a text document.
>
Well, that is the rub isn't it? A string based data model cannot
represent anything that is not a string. Divorcing the data model and
serialization syntax is not limited to application to things that can be
represented as strings. Anything that can be addressed from or by a
serialization syntax can have a different data model imposed on it at
the time of processing. Our examples, to be sure, have been texts, after
all we (Matt & I) are both involved in biblical studies so it is what
you would expect. ;-) Little demand for galactic coordinate spaces and
the like in biblical studies. ;-)
<snip>
>>On the other hand, JITTs does not have a data model. It imposes
>>whatever data model (in your sense of the term) without regard to
>>the how the places in the data are represented in a particular
>>"serialization syntax." In other words, I could impose the XML data
>>model on a Postscript file, or vice versa, but either would require
>>careful attention to the requirements of the output "serialization
>>syntax."
>>
>>You can say that JITTs represents a divorce between any given
>>serialization syntax and a particular data model. Yes, I like that.
>>
>
>Right. In the LMNL approach to the same problem, we might take a
>sequence of characters (a text layer) and derive ranges over those
>characters to create a syntactic layer, and then derive ranges over
>those ranges to create a higher-level layer that represents a
>particular data model, for example an XML Infoset layer.
>
It is the step to the creation of the representation you describe that
leaves me quite curious. Why? Unless there is something I cannot
represent in the original serialization syntax that is an issue, why
even go there?
Take the example of the dictionary I offered over the weekend. (For the
benefit of those who missed that discussion I repeat the example.)
***repeat of example***
<entry><headWord>JITTs</headWord>
(typical OED entry back to early Sumerian usage)
</entry>
Now, you want to build a DOM tree. With the standard XML tree, you get
everything between the <entry></entry> as nodes in the DOM tree. Correct?
As an alternative, for a "lite" searching interface to a dictionary, you
only want: <entry><headWord>JITTs</headWord>(blob of unparsed PCDATA,
which includes all the markup you got as nodes in the DOM tree in the
first one)</entry>
When I find the word I want, in this case JITTs, that block is returned
but this time, a tree is asserted for all the markup in the "blob," and
processed for presentation.
***end repeat of example***
Now as I understand your explanation, everything between <entry> to
</entry> would be converted into the LMNL data model?
In other words, does (or does not) the LMNL data model recognize all the
ranges in XML text being read into the LMNL data model?
If all I need do is avoid interpreting markup that will simply bulk up
my DOM tree, why do I need LMNL? All the markup is still present should
I assert another tree, this one selecting only the entry I want, and I
process all the markup found just like any other XML fragment.
<snip>
Patrick
--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu
|