[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SAX LexicalHandler::comment issue
- From: "W. E. Perry" <wperry@fiduciary.com>
- To: XML DEV <xml-dev@lists.xml.org>
- Date: Mon, 09 Jul 2001 14:28:34 -0400
"Simon St.Laurent" wrote:
> Where exactly in XML 1.0 is the distinction between logical and lexical
> information drawn? I don't believe it really is, except as an unfortunate
> side-effect of describing parsing in the same document which describes
> syntax.
>
> I can't say I trust anyone who talks about _the_ logical view of an XML
> document - I don't believe any such thing exists in a general way. At
> best, there may be some consensus among data-oriented folks, but I don't
> believe there is any general consensus about what always matters and what
> always doesn't.
>
> Is XML what goes into a parser or what comes out? I used to argue for
> blurring those two, but I'm leaning more and more toward XML being the
> input, not the result of parsing.
This protean 'interpretability' is the salient characteristic of text. The
most concrete realization which XML provides--the document
instance--nonetheless remains highly abstracted from each separate
rendition, performance, or other processing which that document instance
might be given. Simon is exactly right in understanding XML as the input
to--not a particular product of--a process.
Anyone who would like a mind-bending view--with contemporary legal,
political and technical relevance--of the chasm between text and its
interpretation might look at
http://www.utm.edu/research/primes/curios/485...443.html. Much of U.S.
intellectual property law turns on the distinction between the expression
(i.e., the instance text) which traditionally may be protected and an
underlying concept or general principle which may not. In the celebrated
(infamous?) DeCSS case, it turns out that a gzipped version of the C source
code for decrypting CSS can be expressed in hex as a 1401-digit prime
number. The website cited posts that number as a curiosity among primes,
though of course the same instance text can clearly be read as something
very different.
These differences between text and some processed logical view of it are not
a new discovery. Nor is this a vague question of ontological form, like the
parable of the moon in the water. The textual instance as surface syntax
carries possibilities which even the most generous logical models will miss
precisely because they refuse, a priori, to give primacy to the
idiosyncratic text. A century and a half of learned philology collapsed with
Milman Parry's insight in the 1920's that Homeric metaphors and epithets
were primarily devices for fitting necessary nouns and the names of heroes,
in the grammatical cases required, into the various strictures of the
metrical feet. The vast edifice of scholarship in the aesthetics of epic
poetry has had to be rebuilt on obstinate facts of syntax which had conveyed
nothing to the most literate critics, but turned out to be the crucial tool
for learning the craft of reciting poetry in a pre-literate society. The
larger point, as Simon notes, is that there is no comprehensive logical view
of the information conveyed by specific syntax. XML is predicated on the
correct choice, the primacy of syntax. Subsequent decisions to build upon
any narrower foundation seem both arrogant and pointless.
Respectfully,
Walter Perry