[
Lists Home |
Date Index |
Thread Index
]
Simon,
Simon St.Laurent wrote:
>Patrick Durusau writes:
>
>>>(SSL)
>>>What's interesting to me about this discussion is the separation of
>>>the information in the XML document from the processing it will
>>>receive. Although the creators and senders of that document may
>>>have their own expectations about how that document will be
>>>processed, there is nothing intrinsic to the XML which binds it to
>>>particular processing.
>>>
>>(PD)
>>Curious that the tree syntax of XML (at least if you have
>>"well-formed" XML) is not seen as a processing requirement. You can
>>process non-"well-formed" XML documents via SAX (or your MOE) that
>>simply ducks the question of why require the tree syntax for validity
>>in the first place? Isn't that a processing requirement as well?
>>
>>Shouldn't processing decide what markup it wants to use and how it
>>wants to use it?
>>
>
>That's an excellent question. I'd mentioned at the end that:
>
>>>Embedding markup in documents is already adding a lot of
>>>information that might from some perspectives better considered
>>>separate from the document.
>>>
>(SSL)
>The hierarchical issues arise from the particular style of embedded
>markup that XML uses, and there's a serious trade-off there. XML is not
>as flexible for created labeled structures as it might be precisely
>because it is typically embedded directly in documents, and because
>XML's creators found ambiguity a problem.
>
Yes, and the ambiguity solution inherited from SGML was to solve the
problem in syntax, not in the processing layer. Since the ambiguity
problem was solved by Earley in 1970 (Earley, J. (1970) An efficient
context-free parsing algorithm. Communications of the Association for
Computing Machinery, 13(2):94-102) as well as dealt with in NLP and
other disciplines by techniques such as active chart parsing and parse
forests, I fail to see any reason to continue to with a solution in syntax.
>
>Other processing systems could use other (non-XML) forms of markup to
>avoid XML's "everything is a tree" notion, or they could use some kind
>of out-of-line markup to enable the description of multiple overlapping
>structures for the same document. XLink/XPointer is one way of doing
>that. I've also been playing with my own Out-of-line fun, Ool:
>http://simonstl.com/projects/ool/
>
The "inline" vs. "out-of-line" distinction does not bear close
examination. All "out-of-line" markup does is move the tree syntax
problem one step away and allow you to have one more tree for that a
particular text. The NLP community can hardly advise speakers/writers to
move ambiguous text "out-of-line" and hence must parse it "inline" where
it occurs. I see the "out-of-line" solution as more of a hint as to the
underlying cause of the problem than a solution.
>
>I'll be talking more about Ool (and about Ted Nelson's ideas which got
>me started that direction) at the Extreme Markup conference in Montreal
>next month. I think you'll be there, and I'll be posting the
>presentation on my site in any event.
>
Yes, I will be in attending! Along with Matthew Brook O'Donnell doing:
Coming Down from the Trees: Next Step in the Evolution of Markup?
Looking forward to the Ool presentation!
Patrick
--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu
|