xml-dev - RE: [xml-dev] Xml is _not

RE: [xml-dev] Xml is _not_ self describing

[ Lists Home | Date Index | Thread Index ]

To: "'xml-dev@lists.xml.org'" <xml-dev@lists.xml.org>
Subject: RE: [xml-dev] Xml is _not_ self describing
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Tue, 15 Jan 2002 15:10:54 -0500
In-reply-to: <2C61CCE8A870D211A523080009B94E4306DF6252@HQ5>
References: <2C61CCE8A870D211A523080009B94E4306DF6252@HQ5>

At 12:17 PM -0600 1/15/02, Bullard, Claude L (Len) wrote:

>A label is not a name unless it is meaningful.
>Natural language is not self-describing unless
>you were taught it.

I guess it depends on what exactly you mean by "self-describing". I 
think a book about the English language written in English is 
self-describing in and of itself, whether anybody speaks English or 
not. However, leaving that aside there's a deeper assumption I want 
to cut off before it becomes too embedded in the debate.

Documents written in natural languages have meaning even if you don't 
speak those languages. They do carry information. They are not random 
strings of characters. I've been reading a lot about the theory and 
history of cryptography  lately, and it's amazing just how much 
information you can pull out of ciphered text, because, in fact it 
isn't random. It's harder to read ciphered text than unciphered text, 
but it's not impossible. And that's a world of difference.

Reading text in a language you don't speak, but which has not been 
deliberately encrypted, is a similar problem; and in fact some of the 
same techniques were applied to languages like Linear B and 
hieroglyphics that are used to break ciphers.

When a document is marked up, the information of the markup is there, 
whether we recognize it or not. It is a property of the text itself, 
not a property of our perception of the text. With appropriate work, 
experience, intelligence, and luck that markup can be understood. Can 
unmarked up text be understood as well? Yes, certainly; but markup 
adds to the information content of the text. It makes it easier to 
decipher its meaning in a very practically useful way. This is a 
question of degree, and text+markup is easier to understand than text 
alone.

Langauge is certainly important, but it is orthogonal issue.  Given 
the choice of data marked up in Ugaritic vs. the same data marked up 
in English, I pick English. But given the choice of data marked up in 
Ugaritic vs. the same data not marked up at all, I pick the data 
marked up in Ugaritic.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
+----------------------------------+---------------------------------+

Follow-Ups:
- Re: [xml-dev] Xml is _not_ self describing
  - From: "Nicolas Lehuen" <nicolas.lehuen@ubicco.com>
- Re: [xml-dev] Xml is _not_ self describing
  - From: Gavin Thomas Nicol <gtn@rbii.com>

References:
- RE: [xml-dev] Xml is _not_ self describing
  - From: "Bullard, Claude L (Len)" <clbullar@ingr.com>

Prev by Date: RE: [xml-dev] Xml is _not_ selfdescribing
Next by Date: RE: [xml-dev] Xml is _not_ self describing
Previous by thread: RE: [xml-dev] Xml is _not_ self describing
Next by thread: Re: [xml-dev] Xml is _not_ self describing
Index(es):
- Date
- Thread