OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Statistical vs "semantic web" approaches to making sense o

[ Lists Home | Date Index | Thread Index ]

Hash: SHA1


I think what you're seeing here is that current approaches to the 
description of data are not very human-friendly. And, in the data 
description world, we are trying to express the very rich concept of 
'semantics' using the blunt instruments of 'metadata' and 'resource 
description'. What I mean by that is that as human beings, we can 
quickly ascribe a large, deep and varied set of semantics to any 
particular spoken sentence, item or situation. On the other hand, 
computers basically understand no semantics at all, only syntax. The 
use of metadata and RDF to describe data is an (albeit small) 
intermediate step on the way to improving chances of computers being 
able to ascribe semantics to words or concepts. At this point, humans 
have to do all of the work of description for the computer (by 
providing adequate metadata and markup). Artificial Intelligence 
techniques may eventually ease this process, but right now, computers 
are best at processing large amounts of data without much semantic 
input, very quickly, and thus statistically-based searches are likely 
to remain our best effort. Of course, over time, more subtle methods 
will emerge. Until they do, on the continuum of interaction between 
humans and computers, humans will be doing most of the teaching, and 
computers will mostly just be sitting there dumbly, waiting to be told 
how they should interpret a particular piece of data (particularly if 
the power switch is off). Which, as the 'metacrap' article points out, 
is not particularly attractive to most humans :) So, xml-dev'ers should 
probably consider themselves on the forefront of a pioneering effort to 
teach computers about semantics. An effort which, given the paucity and 
quality of tools to do this work, should be applauded by all human-kind 
(or not) ;)

- - JohnK

On Wednesday, Apr 23, 2003, at 21:09 US/Eastern, Mike Champion wrote:

> There was an interesting conjunction of articles on the ACM "technews" 
> page [http://www.acm.org/technews/current/homepage.html] -- one on 
> "AI" approaches to spam filtering  
> http://www.nwfusion.com/news/tech/2003/0414techupdate.html and the 
> other on the Semantic Web 
> http://www.computerworld.com/news/2003/story/0,11280,80479,00.html.
> What struck me is that the "AI" approach (I'll guess it makes heavy 
> use of pattern matching and statistical techniques such as Bayesian 
> inference) is working with raw text that the authors are deliberately 
> trying to obfuscate the meaning of to get past "keyword" spam filters, 
> and the Semantic Web approach seems to require explicit, honest 
> markup.  Given the "metacrap" argument about semantic metadata 
> (http://www.well.com/~doctorow/metacrap.htm) I suspect that in general 
> the only way we're going to see a "Semantic Web"  is for 
> statistical/pattern matching software to create the semantic markup 
> and metadata.  That is, if such tools can make useful inferences today 
> about spam that pretends to be something else, they should be very 
> useful in making inferences tomorrow about text written by people who 
> try to say what they mean.
> This raises a question, for me anyway:  If it will take a "better 
> Google than Google" (or perhaps an "Autonomy meets RDF") that uses 
> Baysian or similar statistical techniques to create the markup that 
> the Semantic Web will exploit, what's the point of the semantic 
> markup?  Why won't people just use the "intelligent" software 
> directly?  Wearing my "XML database guy" hat, I hope that the answer 
> is that it will be much more efficient and programmer-friendly to 
> query databases generated by the 'bots containing markup and metadata 
> to find the information one needs.  But I must admit that 5-6 years 
> ago I thought the world would need standardized, widely deployed XML 
> markup before we could get the quality of searches that Google allows 
> today using only raw HTML and PageRank heuristic algorithm.
> So, anyone care to pick holes in my assumptions, or reasoning?  If one 
> does accept the hypothesis that it will take smart software to produce 
> the markup that the Semantic Web will exploit, what *is* the case for 
> believing that it will be ontology-based logical inference engines 
> rather than statistically-based heuristic search engines that people 
> will be using in 5-10 years?  Or is this a false dichotomy?  Or is the 
> "metacrap" argument wrong, and people really can be persuaded to 
> create honest, accurate, self- aware, etc. metadata and semantic 
> markup?
> [please note that my employer, and many colleagues at W3C, may have a 
> very different take on this and please don't blame anyone but me for 
> this blather!]
> -- 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>

Version: GnuPG v1.2.1 (Darwin)



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS