OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Statistical vs "semantic web" approaches to making sense o

[ Lists Home | Date Index | Thread Index ]

By coincidence I've been writing up a semi-refutation of Cory's 'metacrap'
piece, hopefully ready in a day or so.
Semi-refutation because while I agree with most of his observations, they
take a blinkered, hobbled view of metadata and as result I believe the
general conclusions to be way off the mark.

The factor I think that has most relevance to your post (though I've not
read the links yet)  is that it's not an either/or situation. I personally
believe that the web will start getting *really* useful when the explicit
(semweb) and implicit (Google) meet. A question - do you think Google takes
note of the title of documents it indexes?


> -----Original Message-----
> From: Mike Champion [mailto:mc@xegesis.org]
> Sent: 24 April 2003 03:10
> To: xml-dev@lists.xml.org
> Subject: [xml-dev] Statistical vs "semantic web" approaches to making
> sense of the Net
> There was an interesting conjunction of articles on the ACM
> "technews" page
> [http://www.acm.org/technews/current/homepage.html] -- one on "AI"
> approaches to spam filtering
> http://www.nwfusion.com/news/tech/2003/0414techupdate.html and
> the other on
> the Semantic Web
> http://www.computerworld.com/news/2003/story/0,11280,80479,00.html.
> What struck me is that the "AI" approach (I'll guess it makes
> heavy use of
> pattern matching and statistical techniques such as Bayesian
> inference) is
> working with raw text that the authors are deliberately trying to
> obfuscate
> the meaning of to get past "keyword" spam filters, and the Semantic Web
> approach seems to require explicit, honest markup.  Given the "metacrap"
> argument about semantic metadata
> (http://www.well.com/~doctorow/metacrap.htm) I suspect that in
> general the
> only way we're going to see a "Semantic Web"  is for statistical/pattern
> matching software to create the semantic markup and metadata.
> That is, if
> such tools can make useful inferences today about spam that
> pretends to be
> something else, they should be very useful in making inferences tomorrow
> about text written by people who try to say what they mean.
> This raises a question, for me anyway:  If it will take a "better Google
> than Google" (or perhaps an "Autonomy meets RDF") that uses Baysian or
> similar statistical techniques to create the markup that the Semantic Web
> will exploit, what's the point of the semantic markup?  Why won't people
> just use the "intelligent" software directly?  Wearing my "XML database
> guy" hat, I hope that the answer is that it will be much more
> efficient and
> programmer-friendly to query databases generated by the 'bots containing
> markup and metadata to find the information one needs.  But I must admit
> that 5-6 years ago I thought the world would need standardized, widely
> deployed XML markup before we could get the quality of searches
> that Google
> allows today using only raw HTML and PageRank heuristic algorithm.
> So, anyone care to pick holes in my assumptions, or reasoning?
> If one does
> accept the hypothesis that it will take smart software to produce the
> markup that the Semantic Web will exploit, what *is* the case for
> believing
> that it will be ontology-based logical inference engines rather than
> statistically-based heuristic search engines that people will be using in
> 5-10 years?  Or is this a false dichotomy?  Or is the "metacrap" argument
> wrong, and people really can be persuaded to create honest,
> accurate, self-
> aware, etc. metadata and semantic markup?
> [please note that my employer, and many colleagues at W3C, may
> have a very
> different take on this and please don't blame anyone but me for this
> blather!]
> --
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS