OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re : Are we losing out because of grammars? (Re: Schema ambiguit ydetection algorithm for RELAX (1/4))

>Introduction to Algorithmic Information Theory, Nick Szabo; 
>My thanks to Jan Vegt for reminding me about kolmogorov complexity
>These are more useful than Shannon's random source measures for our


Thanks you're a gentleman. I still like the Kolmogorov one-liner "if an
object contains regularities then it has a shorter description than itself".

In the mean time I've been trying to connect that to more practical levels
in supporting Rick's "two levels of grammar", basically I agree with Rick.
[ Allthough it should be broader than grammar defined as a set of rules ;
  it should also be applicable to say the set of all UK placenames etc. ]

I've just received Abiteboul's [et al.] "Data on the Web : from Relations to
Semistructured Data and XML".
It's a very readable and exciting book. Relevant here and now is that he
talks about 'schema extraction' defined as "given one particular data
instance, finding the most specific schema for it".

I know I bored you guys with the emphasisis on the differences between
structured and semi-structured data. But let me quote Abiteboul again "What
sets apart schemas for semistructured data from traditional schemas is the
fact that a given semistructured data instance can have more tha one schema.
This raises the following intriguing possibility : Given a semistructured
data instance for which we do not have any a priori knowledge, compute
automatically some schema for it;of course, given several possible answers,
we want the schema that best describes the structure of that particular
data. We call this problem schema extraction."

This sounds cool to me.



PS	Rick I may have some practical pointers for you later. Need some
time ...