[
Lists Home |
Date Index |
Thread Index
]
I can't wait to see the XML.COM condensed
version of this thread. :-)
Is it there? We can split some fine hairs here, but
often meaning has to be discovered from clues found
elsewhere and then projected onto the text. Worse,
the translations into an understanding readily shared
can vary enormously such that any such original meaning
is distorted or not provable as original until some
acceptable number of texts are translated. There are
linear markings from the Mystery Hill site (American
Stonehenge) which some claim are Phoenician but are
hotly contested otherwise. Before accepted, both
the decipherers and the archaeologists have to
find mutually reinforcing but quite separate
evidence (previous examples of the text types and
artifacts attributable to some past civilization).
It may not be random but be meaningless: see the
problems of assuming some astronomical signals
were meaningful because they were regular (rotating
and emitting). Non-randomness isn't meaningful
per se. One can assume that a wedge-shaped tablet
found in a collection of such is if other evidence
indicates the site is a library, then start building
up example sets until the key is discovered or a
dictionary is created that self-consistent to a
tolerable degree. Otherwise, a Rosetta Stone is
required.
So it isn't that cut and dry. As I said in my
reply to Mike, you can be looking for math only
to discover belatedly, possibly by accident, that
they were just saying Hi: Cheops Slept Here. Once
you know about star alignments, some aspects of
pyramid layouts make sense. Unfortunately,
so does Stonehenge, Mystery Hill and a myriad
of other sites - but it can't be proved and
may not be true in each or every case.
"Documents written in natural languages have meaning even if you don't
speak those languages. They do carry information."
That is so but until you learn them or someone who has tells you,
you don't know what they mean. We are quite close to the
"if the tree falls in the forest.." argument. The best I can
do is say, yes it has meaning to someone and yes, strictly
speaking, by establishing the non-randomness is purposeful, not
a side effect of another regular process, we can agree there
is information there. Shannon built modern communications
by saying reproducibility, not semantics, are the key to
designing communication systems.
That said, we of course agree about the value of tagging regardless
of whether we have the descriptions. XML is self-describing to
the extent one understands the Rosetta Stone that is the
XML 1.0 specification, then acquires by some evidence, a
workable set of descriptions for the tag names. Doctor Goldfarb
often points to glossing as the original modern form of hypertext
and markup.
All other things being equal, given some XML instance, I sure
do prefer a well-documented schema or DTD to reading someone
else's code to discover what I am supposed to expect and
what to do about it. Or just Hide The XML and give me
the stinkin' compiled application to install.
len
-----Original Message-----
From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu]
At 12:17 PM -0600 1/15/02, Bullard, Claude L (Len) wrote:
>A label is not a name unless it is meaningful.
>Natural language is not self-describing unless
>you were taught it.
I guess it depends on what exactly you mean by "self-describing". I
think a book about the English language written in English is
self-describing in and of itself, whether anybody speaks English or
not. However, leaving that aside there's a deeper assumption I want
to cut off before it becomes too embedded in the debate.
Documents written in natural languages have meaning even if you don't
speak those languages. They do carry information. They are not random
strings of characters. I've been reading a lot about the theory and
history of cryptography lately, and it's amazing just how much
information you can pull out of ciphered text, because, in fact it
isn't random. It's harder to read ciphered text than unciphered text,
but it's not impossible. And that's a world of difference.
Reading text in a language you don't speak, but which has not been
deliberately encrypted, is a similar problem; and in fact some of the
same techniques were applied to languages like Linear B and
hieroglyphics that are used to break ciphers.
When a document is marked up, the information of the markup is there,
whether we recognize it or not. It is a property of the text itself,
not a property of our perception of the text. With appropriate work,
experience, intelligence, and luck that markup can be understood. Can
unmarked up text be understood as well? Yes, certainly; but markup
adds to the information content of the text. It makes it easier to
decipher its meaning in a very practically useful way. This is a
question of degree, and text+markup is easier to understand than text
alone.
Langauge is certainly important, but it is orthogonal issue. Given
the choice of data marked up in Ugaritic vs. the same data marked up
in English, I pick English. But given the choice of data marked up in
Ugaritic vs. the same data not marked up at all, I pick the data
marked up in Ugaritic.
|