Lists Home |
Date Index |
By coincidence I've been writing up a semi-refutation of Cory's 'metacrap'
piece, hopefully ready in a day or so.
Semi-refutation because while I agree with most of his observations, they
take a blinkered, hobbled view of metadata and as result I believe the
general conclusions to be way off the mark.
The factor I think that has most relevance to your post (though I've not
read the links yet) is that it's not an either/or situation. I personally
believe that the web will start getting *really* useful when the explicit
(semweb) and implicit (Google) meet. A question - do you think Google takes
note of the title of documents it indexes?
> -----Original Message-----
> From: Mike Champion [mailto:firstname.lastname@example.org]
> Sent: 24 April 2003 03:10
> To: email@example.com
> Subject: [xml-dev] Statistical vs "semantic web" approaches to making
> sense of the Net
> There was an interesting conjunction of articles on the ACM
> "technews" page
> [http://www.acm.org/technews/current/homepage.html] -- one on "AI"
> approaches to spam filtering
> http://www.nwfusion.com/news/tech/2003/0414techupdate.html and
> the other on
> the Semantic Web
> What struck me is that the "AI" approach (I'll guess it makes
> heavy use of
> pattern matching and statistical techniques such as Bayesian
> inference) is
> working with raw text that the authors are deliberately trying to
> the meaning of to get past "keyword" spam filters, and the Semantic Web
> approach seems to require explicit, honest markup. Given the "metacrap"
> argument about semantic metadata
> (http://www.well.com/~doctorow/metacrap.htm) I suspect that in
> general the
> only way we're going to see a "Semantic Web" is for statistical/pattern
> matching software to create the semantic markup and metadata.
> That is, if
> such tools can make useful inferences today about spam that
> pretends to be
> something else, they should be very useful in making inferences tomorrow
> about text written by people who try to say what they mean.
> This raises a question, for me anyway: If it will take a "better Google
> than Google" (or perhaps an "Autonomy meets RDF") that uses Baysian or
> similar statistical techniques to create the markup that the Semantic Web
> will exploit, what's the point of the semantic markup? Why won't people
> just use the "intelligent" software directly? Wearing my "XML database
> guy" hat, I hope that the answer is that it will be much more
> efficient and
> programmer-friendly to query databases generated by the 'bots containing
> markup and metadata to find the information one needs. But I must admit
> that 5-6 years ago I thought the world would need standardized, widely
> deployed XML markup before we could get the quality of searches
> that Google
> allows today using only raw HTML and PageRank heuristic algorithm.
> So, anyone care to pick holes in my assumptions, or reasoning?
> If one does
> accept the hypothesis that it will take smart software to produce the
> markup that the Semantic Web will exploit, what *is* the case for
> that it will be ontology-based logical inference engines rather than
> statistically-based heuristic search engines that people will be using in
> 5-10 years? Or is this a false dichotomy? Or is the "metacrap" argument
> wrong, and people really can be persuaded to create honest,
> accurate, self-
> aware, etc. metadata and semantic markup?
> [please note that my employer, and many colleagues at W3C, may
> have a very
> different take on this and please don't blame anyone but me for this
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>