[
Lists Home |
Date Index |
Thread Index
]
Which is why we were going to markup even before the web: to seed the
classification of the published information, to build "local contexts".
Again, markup is not about identification; it is about classification.
The notion was that free text data using RTF-like systems organized and
labeled for presentation, not querying, or reuse. The horror on the
SGMLers faces when given HTML was not that it would be an unworkable
system; we knew it would go as all gencoding does, like a grassfire.
The horror was that it returned us to and amplified the bad old days
of data not being reusable or easily classifiable.
The idea for content-modeled SGML was that preclassifying the information using human
intelligence (authors) would help. HTML set everything back about two
decades (a success for publishers but a disaster for everything else).
RDF is a better ontology language but once again, one gets that HTML
effect of predetermined definitions over syntax at the level of
the classifying terminology. There is a definite deja vu in the semantic web.
This doesn't mean RDF doesn't work or isn't a more precise means of
classification. It is.
What would be the effect (theoretically) of ceasing to use HTML for
any records of authority (assertions for data mining candidates)
and using XML without RDF? One lives without asserted relationships
for one. Ok, so Topic Maps are added. What then is the role
of RDF?
I ask because although the answers are somewhat obvious, it is also
obvious that RDFing any information published is too much overhead
and the ROI has to be good before people will do it. On the other
hand, the use of XML DTDs and schemas is already well accepted and
it may be preferable to focus development of engines that mine based
on these.
One point of the article referenced originally was that Google could
live large using the same web we have now and not have to insist on
a stratified web. Of course, some potential profits based on hoarding
the ontologies disappears, but that is an information ecosystem
catastrophe in the making anyway.
len
-----Original Message-----
From: jborden@attbi.com [mailto:jborden@attbi.com]
> >Basically it seems to me that way Google has approached the web is as a
> >giant problem in Bayesian Analysis, and that this method has been
> >relatively successful(at least more successful than other methods have
> >been).
>
> Hmmm ... then maybe ontologies could help seed the process with "prior
> probabilities" or something?
This is my exact research interest. The problem with Bayesian/statistical/markov chain analysis is that if the "search space" is unbounded then the process may take an approaching infinite amount of time to resolve. The trick would seem to provide the ability for "local context" or as you say: an ontology seeding the process. This would be done in an interative fashion. For example we can use the "oneOf" mechanism to define a _Class_ as being composed of a given number of _Individuals_ e.g. as determined statistically. One might then equate two Classes, or use a classifier to find the equation of two classes, one determined by statistically derived individual membership, the the other (Class) as being part of a deep hierarchy (ontology). This might go round and round, with the output of each stage statistical stage being fed into a subsequent logical classification stage etc. It might work. On the other hand I might just be wasting my time.
|