OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Data mining the semantic web? (was RE: [xml-dev] Semantic

[ Lists Home | Date Index | Thread Index ]

Which is why we were going to markup even before the web: to seed the 
classification of the published information, to build "local contexts". 

Again, markup is not about identification;  it is about classification.  
The notion was that free text data using RTF-like systems organized and 
labeled for presentation, not querying, or reuse.  The horror on the 
SGMLers faces when given HTML was not that it would be an unworkable 
system; we knew it would go as all gencoding does, like a grassfire. 
The horror was that it returned us to and amplified the bad old days 
of data not being reusable or easily classifiable. 
The idea for content-modeled SGML was that preclassifying the information using human 
intelligence (authors) would help.   HTML set everything back about two 
decades (a success for publishers but a disaster for everything else).

RDF is a better ontology language but once again, one gets that HTML 
effect of predetermined definitions over syntax at the level of 
the classifying terminology.   There is a definite deja vu in the semantic web. 
This doesn't mean RDF doesn't work or isn't a more precise means of 
classification.   It is.

What would be the effect (theoretically) of ceasing to use HTML for 
any records of authority (assertions for data mining candidates) 
and using XML without RDF?  One lives without asserted relationships 
for one.  Ok, so Topic Maps are added.  What then is the role 
of RDF?

I ask because although the answers are somewhat obvious, it is also 
obvious that RDFing any information published is too much overhead 
and the ROI has to be good before people will do it.  On the other 
hand, the use of XML DTDs and schemas is already well accepted and 
it may be preferable to focus development of engines that mine based 
on these.

One point of the article referenced originally was that Google could
live large using the same web we have now and not have to insist on 
a stratified web.  Of course, some potential profits based on hoarding 
the ontologies disappears, but that is an information ecosystem 
catastrophe in the making anyway.


-----Original Message-----
From: jborden@attbi.com [mailto:jborden@attbi.com]

> >Basically it seems to me that way Google has approached the web is as a
> >giant problem in Bayesian Analysis, and that this method has been
> >relatively successful(at least more successful than other methods have
> >been).
> Hmmm ... then maybe ontologies could help seed the process with "prior
> probabilities" or something?  

This is my exact research interest. The problem with Bayesian/statistical/markov chain analysis is that if the "search space" is unbounded then the process may take an approaching infinite amount of time to resolve. The trick would seem to provide the ability for "local context" or as you say: an ontology seeding the process. This would be done in an interative fashion. For example we can use the "oneOf" mechanism to define a _Class_ as being composed of a given number of _Individuals_ e.g. as determined statistically. One might then equate two Classes, or use a classifier to find the equation of two classes, one determined by statistically derived individual membership, the the other (Class) as being part of a deep hierarchy (ontology). This might go round and round, with the output of each stage statistical stage being fed into a subsequent logical classification stage etc. It might work. On the other hand I might just be wasting my time.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS