[
Lists Home |
Date Index |
Thread Index
]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Mike,
I think what you're seeing here is that current approaches to the
description of data are not very human-friendly. And, in the data
description world, we are trying to express the very rich concept of
'semantics' using the blunt instruments of 'metadata' and 'resource
description'. What I mean by that is that as human beings, we can
quickly ascribe a large, deep and varied set of semantics to any
particular spoken sentence, item or situation. On the other hand,
computers basically understand no semantics at all, only syntax. The
use of metadata and RDF to describe data is an (albeit small)
intermediate step on the way to improving chances of computers being
able to ascribe semantics to words or concepts. At this point, humans
have to do all of the work of description for the computer (by
providing adequate metadata and markup). Artificial Intelligence
techniques may eventually ease this process, but right now, computers
are best at processing large amounts of data without much semantic
input, very quickly, and thus statistically-based searches are likely
to remain our best effort. Of course, over time, more subtle methods
will emerge. Until they do, on the continuum of interaction between
humans and computers, humans will be doing most of the teaching, and
computers will mostly just be sitting there dumbly, waiting to be told
how they should interpret a particular piece of data (particularly if
the power switch is off). Which, as the 'metacrap' article points out,
is not particularly attractive to most humans :) So, xml-dev'ers should
probably consider themselves on the forefront of a pioneering effort to
teach computers about semantics. An effort which, given the paucity and
quality of tools to do this work, should be applauded by all human-kind
(or not) ;)
- - JohnK
On Wednesday, Apr 23, 2003, at 21:09 US/Eastern, Mike Champion wrote:
>
> There was an interesting conjunction of articles on the ACM "technews"
> page [http://www.acm.org/technews/current/homepage.html] -- one on
> "AI" approaches to spam filtering
> http://www.nwfusion.com/news/tech/2003/0414techupdate.html and the
> other on the Semantic Web
> http://www.computerworld.com/news/2003/story/0,11280,80479,00.html.
>
> What struck me is that the "AI" approach (I'll guess it makes heavy
> use of pattern matching and statistical techniques such as Bayesian
> inference) is working with raw text that the authors are deliberately
> trying to obfuscate the meaning of to get past "keyword" spam filters,
> and the Semantic Web approach seems to require explicit, honest
> markup. Given the "metacrap" argument about semantic metadata
> (http://www.well.com/~doctorow/metacrap.htm) I suspect that in general
> the only way we're going to see a "Semantic Web" is for
> statistical/pattern matching software to create the semantic markup
> and metadata. That is, if such tools can make useful inferences today
> about spam that pretends to be something else, they should be very
> useful in making inferences tomorrow about text written by people who
> try to say what they mean.
>
> This raises a question, for me anyway: If it will take a "better
> Google than Google" (or perhaps an "Autonomy meets RDF") that uses
> Baysian or similar statistical techniques to create the markup that
> the Semantic Web will exploit, what's the point of the semantic
> markup? Why won't people just use the "intelligent" software
> directly? Wearing my "XML database guy" hat, I hope that the answer
> is that it will be much more efficient and programmer-friendly to
> query databases generated by the 'bots containing markup and metadata
> to find the information one needs. But I must admit that 5-6 years
> ago I thought the world would need standardized, widely deployed XML
> markup before we could get the quality of searches that Google allows
> today using only raw HTML and PageRank heuristic algorithm.
>
> So, anyone care to pick holes in my assumptions, or reasoning? If one
> does accept the hypothesis that it will take smart software to produce
> the markup that the Semantic Web will exploit, what *is* the case for
> believing that it will be ontology-based logical inference engines
> rather than statistically-based heuristic search engines that people
> will be using in 5-10 years? Or is this a false dichotomy? Or is the
> "metacrap" argument wrong, and people really can be persuaded to
> create honest, accurate, self- aware, etc. metadata and semantic
> markup?
>
> [please note that my employer, and many colleagues at W3C, may have a
> very different take on this and please don't blame anyone but me for
> this blather!]
>
>
> --
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (Darwin)
iD8DBQE+p8M9n677NT86+ZsRAvIsAJ9aeY1qYIAJvGeyOSuu1ubG2bjYrQCgpVKF
+0KeeqNp5CukQ5u6Jjbfuoc=
=hIgl
-----END PGP SIGNATURE-----
|