[
Lists Home |
Date Index |
Thread Index
]
Computer scientists don't write limericks without designing a general
architecture for limericks.
Suppose I wanted to write the limerick constraint testing system that Mike
wants. I would probably want to separate concerns among several different
modules, and I would like to make it easy to look at the results of each
step. I think that I am unlikely to persuade my limerick authors to write
their limericks in finely grained markup, so I suspect they will give me
texts like this:
There was a young lady named Bright
Whose speed was much faster than light.
She set out one day,
in a relative way,
And returned on the previous night.
With a Perl script, it is fairly easy to mark up the lines of this poem,
and I would like to do this before running my syllabification, because it
is quite likely that the syllabification engine will lose my whitespace,
which is important for identifying the lines. If I already know this is a
limerick, I could choose to divide this up into long and short lines from
the beginning:
<limerick>
<long>There was a young lady named Bright</long>
<long>Whose speed was much faster than light.</long>
<short>She set out one day,/short>
<short>in a relative way,</short>
<long>And returned on the previous night.</long>
</limerick>
For testing the rhyme scheme, further markup is probably not helpful. Also,
I am probably not going to keep any markup I use for testing whether a line
scans, so I will use a schema adjunct to declare the constraints on this
poem. I use standard poetry terms for the metrical feet - here is a table
of the terms, compared with their representation in Cowan Normal Form (CWF):
iamb da-DUM
anapest da-da-DUM
tertius paeon da-da-DUM-da
Here is a Schema Adjunct that declares the constraints on a limerick:
<schema-adjunct targetNamespace="http://www.example.com/limerick"
xmlns="http://www.schema-adjuncts.org/namespaces/2001/07/saf">
<global>
<rhymes>
<line select="limerick/long[1]" />
<line select="limerick/long[2]" />
<line select="limerick/long[3]" />
</rhymes>
<rhymes>
<line select="limerick/short[1]" />
<line select="limerick/short[2]" />
</rhymes>
</global>
<element context="short">
<scans>
<sequence>
<choice>
<iamb /> <!-- da dum -->
<anapest /> <!-- da da dum -->
</choice>
<choice>
<iamb />
<anapest />
</choice>
</sequence>
</scans>
</element>
<element context="long">
<scans>
<sequence>
<choice>
<iamb />
<anapest />
</choice>
<choice>
<iamb />
<anapest />
</choice>
<choice>
<iamb />
<anapest />
<tertius.paeon /> <!-- da da dum da -->
</choice>
</sequence>
</scans>
</element>
</schema-adjunct>
So far, I have written no code, so I have no software that will tell me
whether a line scans or whether a set of lines rhyme. However, I do have a
way of declaring the structure of a poem in a Schema Adjunct, and I can use
it to describe the structure of other kinds of poems as well. The specific
algorithms for testing these constraints is up to the implementations, but
I have also modularized the implementation.
I have also made the implementation easier to test - I can write test
suites that take sets of words that are presumed to rhyme or not to rhyme,
and see whether my system handles them correctly. I can do the same for
scansion.
Now suppose that more than one rhyming engine exists, and more than one
scansion engine exists. Do these engines agree? If not, how do they
disagree? Are there bugs in one or both of the engines, or are their
answers both reasonable? If the answers to these questions are important to
me, a concrete representation of the output of the engines may be very
helpful. Without it, I would have to compare the source code of the
systems, or try to create exhaustive sets of tests that would give me
indications of how they work.
For instance, suppose I ask the software to test whether the following scans:
<long>There was a young lady named Bright</long>
If it says that it does not, I may not be sure whether there is a bug in my
software or an error in the line of the limerick. If there is a bug in my
software, I may not know if the bug is in the syllabification per se, in
the stress assigned to syllables, or in the comparison of the
syllabification and stress to that required of a long line in a limerick.
For testing purposes, output like the following can be very helpful indeed:
<long>
<da>There</da>
<dum>was</dum>
<da>a</da>
<da>young</da>
<dum>la</dum>
<da>dy</da>
<da>named</da>
<dum>Bright</dum>
</long>
Not only is this useful for testing, it is also useful for defining
interfaces. For instance, I might well have a system that takes the above
representation and compares it to the declared scansion for the long line
of a limerick, as given in the above schema adjunct. This would be very
simple to write.
In general, when designing complex systems, I think it is very helpful to
think in terms of declarative, testable architectures.
Jonathan
|