[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Success factors for the Web and Semantic Web
- From: Bill dehOra <BdehOra@interx.com>
- To: Miles Sabin <MSabin@interx.com>, xml-dev@lists.xml.org
- Date: Tue, 02 Jan 2001 11:14:26 +0000
> Ignoring metadata and working with the raw link topology is
> driven by the assumptions that referrers (ie. human authors) will
> more often than not make relevant links; that if several things
> are all linked to from the same place they're quite likely to be
> related in some way; and that links will cluster naturally around
> relatively distinct topics of interest rather than degenerating
> into mush. Notice that we can say all of this without once having
> to worry about what any of the stuff actually _means_.
>
> We also get something else, not for free, but at least tractably.
> To pick up the theme of another thread, link topologies are
> essential global, whereas link metadata is typically local. Where
> the former is the result of the activites of numerous, mutually
> oblivious authors with overlapping areas of knowledge, interest
> and expertise, the latter is typically the product of individuals
> or small groups with particular, partial, interests. Making
> metadata global in any useful way requires massive coordinated
> intellectual and political effort (Simon's already raised some
> doubts about whether or not we should consider that an
> unqualified good). Global link topology just needs a warehouse
> full of servers and a ludicrous amount of bandwidth.
(And some decent algorithms)
Though it may be painful to the logic/symbol crowd, the number crunchers are
definitely winning the scalability argument. The other point is that beating
a corpus to death with statistics and machine learning algorithms is known
to work reasonably well. We're hypothesizing that an annotated corpus plus
inference will work: it's very much a grand experiment.
It would be a riot though if we create all this metadata just to have it
processed statistically :) Joking aside, hybrid systems make a lot of sense:
I'd love to see Google crunch metadata instead of melonballing web pages.
> The assumptions behind this approach seem pretty plausible a
> priori, and both Google and my long-time favourite domain-
> specific search engine, ResearchIndex (aka CiteSeer)[1] seem to
> back them up. Then again, I've always used bibliographies as my
> primary research tool, so maybe I'm biased.
Some of over the over-your-shoulder agent/browser assistant type research
made a similar assumption a few years ago. That is, we already have a
semantic web: at some point some human linked some two documents together,
for a reason. We just don't know how to reverse engineer that intent very
well, hence the perceived need for metadata. Indeed any corpus can be
assumed to have semantic links, since links aren't usually randomly
distributed any more than words are. Maybe the idea of a non-semantic link
is a nonsense. This assumption may be less valid now that many pages and
links are machine generated: or maybe it's even more valid, I stop at
discussing machine intentionality :)
-Bill
-----
Bill de hÓra : InterX : bdehora@interx.com