Lists Home |
Date Index |
> -----Original Message-----
> From: Bullard, Claude L (Len) [mailto:firstname.lastname@example.org]
> Sent: 15 January 2002 20:46
> To: 'Elliotte Rusty Harold'; 'email@example.com'
> Subject: RE: [xml-dev] Xml is _not_ self describing
> I can't wait to see the XML.COM condensed
> version of this thread. :-)
And me, because hopefully then I can read it and
understand what the real issue is.
Seriously though, I gave a talk recently, introducing Markup
and XML to some Medical Informatics students. I outlined the
overheads of writing custom parsers for custom formats;
suggested that providing additional rules to structure
data formats could improve the situation; then explained
why CSV is fragile and limited; and then introduced
labelled formats as the best solution.
I also made it clear that introducing grammatical rules
such as labelling doesn't necessarily say anything about
the meaning of the data following those rules
(cf: Edward Lear). That's for a higher layer.
They seemed to accept the benefits of this, and
understood where the limitations were.
So aside from the philosophy (interesting as it is) it seems to me
there's a fairly simple message to get across. Is there any real
evidence that there's been a failure to communicate it, beyond
the existing marketing-technology disconnects?
Personally I'm not sure I've seen it. Most developers I've worked
with just approach XML as syntax, and don't expect a whole
Leigh Dodds, Research Group, Ingenta | "Pluralitas non est ponenda
http://weblogs.userland.com/eclectic | sine necessitate"
http://www.xml.com/pub/xmldeviant | -- William of Ockham
> Is it there? We can split some fine hairs here, but
> often meaning has to be discovered from clues found
> elsewhere and then projected onto the text. Worse,
> the translations into an understanding readily shared
> can vary enormously such that any such original meaning
> is distorted or not provable as original until some
> acceptable number of texts are translated. There are
> linear markings from the Mystery Hill site (American
> Stonehenge) which some claim are Phoenician but are
> hotly contested otherwise. Before accepted, both
> the decipherers and the archaeologists have to
> find mutually reinforcing but quite separate
> evidence (previous examples of the text types and
> artifacts attributable to some past civilization).
> It may not be random but be meaningless: see the
> problems of assuming some astronomical signals
> were meaningful because they were regular (rotating
> and emitting). Non-randomness isn't meaningful
> per se. One can assume that a wedge-shaped tablet
> found in a collection of such is if other evidence
> indicates the site is a library, then start building
> up example sets until the key is discovered or a
> dictionary is created that self-consistent to a
> tolerable degree. Otherwise, a Rosetta Stone is
> So it isn't that cut and dry. As I said in my
> reply to Mike, you can be looking for math only
> to discover belatedly, possibly by accident, that
> they were just saying Hi: Cheops Slept Here. Once
> you know about star alignments, some aspects of
> pyramid layouts make sense. Unfortunately,
> so does Stonehenge, Mystery Hill and a myriad
> of other sites - but it can't be proved and
> may not be true in each or every case.
> "Documents written in natural languages have meaning even if you don't
> speak those languages. They do carry information."
> That is so but until you learn them or someone who has tells you,
> you don't know what they mean. We are quite close to the
> "if the tree falls in the forest.." argument. The best I can
> do is say, yes it has meaning to someone and yes, strictly
> speaking, by establishing the non-randomness is purposeful, not
> a side effect of another regular process, we can agree there
> is information there. Shannon built modern communications
> by saying reproducibility, not semantics, are the key to
> designing communication systems.
> That said, we of course agree about the value of tagging regardless
> of whether we have the descriptions. XML is self-describing to
> the extent one understands the Rosetta Stone that is the
> XML 1.0 specification, then acquires by some evidence, a
> workable set of descriptions for the tag names. Doctor Goldfarb
> often points to glossing as the original modern form of hypertext
> and markup.
> All other things being equal, given some XML instance, I sure
> do prefer a well-documented schema or DTD to reading someone
> else's code to discover what I am supposed to expect and
> what to do about it. Or just Hide The XML and give me
> the stinkin' compiled application to install.
> -----Original Message-----
> From: Elliotte Rusty Harold [mailto:firstname.lastname@example.org]
> At 12:17 PM -0600 1/15/02, Bullard, Claude L (Len) wrote:
> >A label is not a name unless it is meaningful.
> >Natural language is not self-describing unless
> >you were taught it.
> I guess it depends on what exactly you mean by "self-describing". I
> think a book about the English language written in English is
> self-describing in and of itself, whether anybody speaks English or
> not. However, leaving that aside there's a deeper assumption I want
> to cut off before it becomes too embedded in the debate.
> Documents written in natural languages have meaning even if you don't
> speak those languages. They do carry information. They are not random
> strings of characters. I've been reading a lot about the theory and
> history of cryptography lately, and it's amazing just how much
> information you can pull out of ciphered text, because, in fact it
> isn't random. It's harder to read ciphered text than unciphered text,
> but it's not impossible. And that's a world of difference.
> Reading text in a language you don't speak, but which has not been
> deliberately encrypted, is a similar problem; and in fact some of the
> same techniques were applied to languages like Linear B and
> hieroglyphics that are used to break ciphers.
> When a document is marked up, the information of the markup is there,
> whether we recognize it or not. It is a property of the text itself,
> not a property of our perception of the text. With appropriate work,
> experience, intelligence, and luck that markup can be understood. Can
> unmarked up text be understood as well? Yes, certainly; but markup
> adds to the information content of the text. It makes it easier to
> decipher its meaning in a very practically useful way. This is a
> question of degree, and text+markup is easier to understand than text
> Langauge is certainly important, but it is orthogonal issue. Given
> the choice of data marked up in Ugaritic vs. the same data marked up
> in English, I pick English. But given the choice of data marked up in
> Ugaritic vs. the same data not marked up at all, I pick the data
> marked up in Ugaritic.
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>