Lists Home |
Date Index |
Well said Leigh.
> > -----Original Message-----
> > From: Bullard, Claude L (Len) [mailto:email@example.com]
> > Sent: 15 January 2002 20:46
> > To: 'Elliotte Rusty Harold'; 'firstname.lastname@example.org'
> > Subject: RE: [xml-dev] Xml is _not_ self describing
> > I can't wait to see the XML.COM condensed
> > version of this thread. :-)
> And me, because hopefully then I can read it and
> understand what the real issue is.
> Seriously though, I gave a talk recently, introducing Markup
> and XML to some Medical Informatics students. I outlined the
> overheads of writing custom parsers for custom formats;
> suggested that providing additional rules to structure
> data formats could improve the situation; then explained
> why CSV is fragile and limited; and then introduced
> labelled formats as the best solution.
> I also made it clear that introducing grammatical rules
> such as labelling doesn't necessarily say anything about
> the meaning of the data following those rules
> (cf: Edward Lear). That's for a higher layer.
> They seemed to accept the benefits of this, and
> understood where the limitations were.
> So aside from the philosophy (interesting as it is) it seems to me
> there's a fairly simple message to get across. Is there any real
> evidence that there's been a failure to communicate it, beyond
> the existing marketing-technology disconnects?
> Personally I'm not sure I've seen it. Most developers I've worked
> with just approach XML as syntax, and don't expect a whole
> lot more.
> Leigh Dodds, Research Group, Ingenta | "Pluralitas non est ponenda
> http://weblogs.userland.com/eclectic | sine necessitate"
> http://www.xml.com/pub/xmldeviant | -- William of Ockham
> > Is it there? We can split some fine hairs here, but
> > often meaning has to be discovered from clues found
> > elsewhere and then projected onto the text. Worse,
> > the translations into an understanding readily shared
> > can vary enormously such that any such original meaning
> > is distorted or not provable as original until some
> > acceptable number of texts are translated. There are
> > linear markings from the Mystery Hill site (American
> > Stonehenge) which some claim are Phoenician but are
> > hotly contested otherwise. Before accepted, both
> > the decipherers and the archaeologists have to
> > find mutually reinforcing but quite separate
> > evidence (previous examples of the text types and
> > artifacts attributable to some past civilization).
> > It may not be random but be meaningless: see the
> > problems of assuming some astronomical signals
> > were meaningful because they were regular (rotating
> > and emitting). Non-randomness isn't meaningful
> > per se. One can assume that a wedge-shaped tablet
> > found in a collection of such is if other evidence
> > indicates the site is a library, then start building
> > up example sets until the key is discovered or a
> > dictionary is created that self-consistent to a
> > tolerable degree. Otherwise, a Rosetta Stone is
> > required.
> > So it isn't that cut and dry. As I said in my
> > reply to Mike, you can be looking for math only
> > to discover belatedly, possibly by accident, that
> > they were just saying Hi: Cheops Slept Here. Once
> > you know about star alignments, some aspects of
> > pyramid layouts make sense. Unfortunately,
> > so does Stonehenge, Mystery Hill and a myriad
> > of other sites - but it can't be proved and
> > may not be true in each or every case.
> > "Documents written in natural languages have meaning even if you don't
> > speak those languages. They do carry information."
> > That is so but until you learn them or someone who has tells you,
> > you don't know what they mean. We are quite close to the
> > "if the tree falls in the forest.." argument. The best I can
> > do is say, yes it has meaning to someone and yes, strictly
> > speaking, by establishing the non-randomness is purposeful, not
> > a side effect of another regular process, we can agree there
> > is information there. Shannon built modern communications
> > by saying reproducibility, not semantics, are the key to
> > designing communication systems.
> > That said, we of course agree about the value of tagging regardless
> > of whether we have the descriptions. XML is self-describing to
> > the extent one understands the Rosetta Stone that is the
> > XML 1.0 specification, then acquires by some evidence, a
> > workable set of descriptions for the tag names. Doctor Goldfarb
> > often points to glossing as the original modern form of hypertext
> > and markup.
> > All other things being equal, given some XML instance, I sure
> > do prefer a well-documented schema or DTD to reading someone
> > else's code to discover what I am supposed to expect and
> > what to do about it. Or just Hide The XML and give me
> > the stinkin' compiled application to install.
> > len
> > -----Original Message-----
> > From: Elliotte Rusty Harold [mailto:email@example.com]
> > At 12:17 PM -0600 1/15/02, Bullard, Claude L (Len) wrote:
> > >A label is not a name unless it is meaningful.
> > >Natural language is not self-describing unless
> > >you were taught it.
> > I guess it depends on what exactly you mean by "self-describing". I
> > think a book about the English language written in English is
> > self-describing in and of itself, whether anybody speaks English or
> > not. However, leaving that aside there's a deeper assumption I want
> > to cut off before it becomes too embedded in the debate.
> > Documents written in natural languages have meaning even if you don't
> > speak those languages. They do carry information. They are not random
> > strings of characters. I've been reading a lot about the theory and
> > history of cryptography lately, and it's amazing just how much
> > information you can pull out of ciphered text, because, in fact it
> > isn't random. It's harder to read ciphered text than unciphered text,
> > but it's not impossible. And that's a world of difference.
> > Reading text in a language you don't speak, but which has not been
> > deliberately encrypted, is a similar problem; and in fact some of the
> > same techniques were applied to languages like Linear B and
> > hieroglyphics that are used to break ciphers.
> > When a document is marked up, the information of the markup is there,
> > whether we recognize it or not. It is a property of the text itself,
> > not a property of our perception of the text. With appropriate work,
> > experience, intelligence, and luck that markup can be understood. Can
> > unmarked up text be understood as well? Yes, certainly; but markup
> > adds to the information content of the text. It makes it easier to
> > decipher its meaning in a very practically useful way. This is a
> > question of degree, and text+markup is easier to understand than text
> > alone.
> > Langauge is certainly important, but it is orthogonal issue. Given
> > the choice of data marked up in Ugaritic vs. the same data marked up
> > in English, I pick English. But given the choice of data marked up in
> > Ugaritic vs. the same data not marked up at all, I pick the data
> > marked up in Ugaritic.
> > -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> > The list archives are at http://lists.xml.org/archives/xml-dev/
> > To subscribe or unsubscribe from this list use the subscription
> > manager: <http://lists.xml.org/ob/adm.pl>
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>