[
Lists Home |
Date Index |
Thread Index
]
Many people in the text linguistics community, in which I include
myself with one of my many hats on, have found that there is great
value in hierarchical analysis which is directly and explicitly
reflected in the annotation technology we use. Hierarchy is only
implicit if you use the "milestone tag" approach you describe.
Of course, in many cases a single hierarchy is neither in principle
appropriate, nor in practice possible, as in the examples you
describe.
A number of different approaches have been developed to address this.
The one I recommend is known as "standoff markup" or "standoff
annotation". You can find descriptions of it in several papers [1], [2]
and it forms the basis for ongoing work within the Text Coding
Initiative and ISO for standardising an XML-based approach to corpus
annotation [3].
ht
[1] http://www.ltg.ed.ac.uk/~ht/sgmleu97.html
[2] http://www.cs.vassar.edu/~ide/papers/ide-brew-lrec2000.ps
[3] http://www.cs.vassar.edu/~ide/papers/ACL2003-ws-LAF.pdf
--
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
Half-time member of W3C Team
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
|