[
Lists Home |
Date Index |
Thread Index
]
- From: David Megginson <david@megginson.com>
- To: "XML Developers' List" <xml-dev@ic.ac.uk>
- Date: Mon, 14 Sep 1998 10:00:24 -0400
Fernando Cabral writes:
> In order to test some characteristics of a SGML-based search
> engine, I need some XML files. I would prefer having some classics
> of the literature, preferentially those including attributes like
> emphasis, bold, italics and diacritics.
Please don't take this the wrong way, but I'm hoping that this search
will fail (at least, the part about "bold", "italics", etc.). There
are special circumstances where people would mark up presentational
information like typefaces in XML (codicology and library science are
two obvious examples), but for general-purpose use, an XML literary
text would say what something *is* rather than what it should *look
like*. For example,
BAD (usually):
<newline>
"What a <italic>beau</italic>!" signed Cecille.
GOOD (usually):
<p><q>What a <foreign>beau</emphatic>!</q> sighed Cecille.</p>
A literary or linguistic scholar might add all sorts of extra
information:
<para><q ref="Ce0020"><s type="excl">What a <foreign
source="FR" period="s.xix" usage="m-class
u-class">beau</emphatic>!</s></q> sighed <name
ref="Ce0020">Cecille</cecille>.</para>
Sure, it looks like hell, but the scholar can use this to generate an
index of proper names (usefull for a 2,000-page Victorian novel) and
index of foreign terms, and can execute queries like
How often does Cecille use French words in an exclamatory sentence?
Don't try this at home.
All the best,
David
--
David Megginson david@megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|