OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Xml is _not_ self describing

[ Lists Home | Date Index | Thread Index ]

> -----Original Message-----
> From: Bullard, Claude L (Len) [mailto:clbullar@ingr.com]
> Sent: 15 January 2002 20:46
> To: 'Elliotte Rusty Harold'; 'xml-dev@lists.xml.org'
> Subject: RE: [xml-dev] Xml is _not_ self describing
> I can't wait to see the XML.COM condensed 
> version of this thread. :-)

And me, because hopefully then I can read it and 
understand what the real issue is.




Seriously though, I gave a talk recently, introducing Markup 
and XML to some Medical Informatics students. I outlined the 
overheads of writing custom parsers for custom formats; 
suggested that providing additional rules to structure 
data formats could improve the situation; then explained 
why CSV is fragile and limited; and then introduced 
labelled formats as the best solution.

I also made it clear that introducing grammatical rules 
such as labelling doesn't necessarily say anything about 
the meaning of the data following those rules 
(cf: Edward Lear). That's for a higher layer.

They seemed to accept the benefits of this, and 
understood where the limitations were. 

So aside from the philosophy (interesting as it is) it seems to me 
there's a fairly simple message to get across. Is there any real 
evidence that there's been a failure to communicate it, beyond 
the existing marketing-technology disconnects?

Personally I'm not sure I've seen it. Most developers I've worked 
with just approach XML as syntax, and don't expect a whole 
lot more.



Leigh Dodds, Research Group, Ingenta | "Pluralitas non est ponenda
http://weblogs.userland.com/eclectic |    sine necessitate"
http://www.xml.com/pub/xmldeviant    |     -- William of Ockham

> Is it there?  We can split some fine hairs here, but 
> often meaning has to be discovered from clues found 
> elsewhere and then projected onto the text.  Worse, 
> the translations into an understanding readily shared 
> can vary enormously such that any such original meaning 
> is distorted or not provable as original until some 
> acceptable number of texts are translated.  There are 
> linear markings from the Mystery Hill site (American 
> Stonehenge) which some claim are Phoenician but are 
> hotly contested otherwise.  Before accepted, both 
> the decipherers and the archaeologists have to 
> find mutually reinforcing but quite separate 
> evidence (previous examples of the text types and 
> artifacts attributable to some past civilization). 
> It may not be random but be meaningless:  see the 
> problems of assuming some astronomical signals 
> were meaningful because they were regular (rotating 
> and emitting).  Non-randomness isn't meaningful 
> per se.  One can assume that a wedge-shaped tablet 
> found in a collection of such is if other evidence 
> indicates the site is a library, then start building 
> up example sets until the key is discovered or a 
> dictionary is created that self-consistent to a 
> tolerable degree.  Otherwise, a Rosetta Stone is 
> required.
> So it isn't that cut and dry.  As I said in my 
> reply to Mike, you can be looking for math only 
> to discover belatedly, possibly by accident, that 
> they were just saying Hi: Cheops Slept Here.  Once 
> you know about star alignments, some aspects of 
> pyramid layouts make sense.  Unfortunately, 
> so does Stonehenge, Mystery Hill and a myriad 
> of other sites - but it can't be proved and 
> may not be true in each or every case.
> "Documents written in natural languages have meaning even if you don't 
> speak those languages. They do carry information."
> That is so but until you learn them or someone who has tells you, 
> you don't know what they mean.  We are quite close to the 
> "if the tree falls in the forest.." argument.  The best I can 
> do is say, yes it has meaning to someone and yes, strictly 
> speaking, by establishing the non-randomness is purposeful, not 
> a side effect of another regular process, we can agree there 
> is information there.  Shannon built modern communications 
> by saying reproducibility, not semantics, are the key to 
> designing communication systems.
> That said, we of course agree about the value of tagging regardless 
> of whether we have the descriptions.  XML is self-describing to 
> the extent one understands the Rosetta Stone that is the 
> XML 1.0 specification, then acquires by some evidence, a 
> workable set of descriptions for the tag names.  Doctor Goldfarb 
> often points to glossing as the original modern form of hypertext
> and markup.
> All other things being equal, given some XML instance, I sure 
> do prefer a well-documented schema or DTD to reading someone 
> else's code to discover what I am supposed to expect and 
> what to do about it.  Or just Hide The XML and give me 
> the stinkin' compiled application to install.
> len
> -----Original Message-----
> From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu]
> At 12:17 PM -0600 1/15/02, Bullard, Claude L (Len) wrote:
> >A label is not a name unless it is meaningful.
> >Natural language is not self-describing unless
> >you were taught it.
> I guess it depends on what exactly you mean by "self-describing". I 
> think a book about the English language written in English is 
> self-describing in and of itself, whether anybody speaks English or 
> not. However, leaving that aside there's a deeper assumption I want 
> to cut off before it becomes too embedded in the debate.
> Documents written in natural languages have meaning even if you don't 
> speak those languages. They do carry information. They are not random 
> strings of characters. I've been reading a lot about the theory and 
> history of cryptography  lately, and it's amazing just how much 
> information you can pull out of ciphered text, because, in fact it 
> isn't random. It's harder to read ciphered text than unciphered text, 
> but it's not impossible. And that's a world of difference.
> Reading text in a language you don't speak, but which has not been 
> deliberately encrypted, is a similar problem; and in fact some of the 
> same techniques were applied to languages like Linear B and 
> hieroglyphics that are used to break ciphers.
> When a document is marked up, the information of the markup is there, 
> whether we recognize it or not. It is a property of the text itself, 
> not a property of our perception of the text. With appropriate work, 
> experience, intelligence, and luck that markup can be understood. Can 
> unmarked up text be understood as well? Yes, certainly; but markup 
> adds to the information content of the text. It makes it easier to 
> decipher its meaning in a very practically useful way. This is a 
> question of degree, and text+markup is easier to understand than text 
> alone.
> Langauge is certainly important, but it is orthogonal issue.  Given 
> the choice of data marked up in Ugaritic vs. the same data marked up 
> in English, I pick English. But given the choice of data marked up in 
> Ugaritic vs. the same data not marked up at all, I pick the data 
> marked up in Ugaritic.
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS