[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re : Are we losing out because of grammars? (Re: Schema ambiguit ydetection algorithm for RELAX (1/4))
- From: "Vegt, Jan" <Jan.Vegt@softwareag.com>
- To: "'email@example.com'" <firstname.lastname@example.org>
- Date: Wed, 31 Jan 2001 20:31:24 +0100
>Introduction to Algorithmic Information Theory, Nick Szabo;
>My thanks to Jan Vegt for reminding me about kolmogorov complexity
>These are more useful than Shannon's random source measures for our
Thanks you're a gentleman. I still like the Kolmogorov one-liner "if an
object contains regularities then it has a shorter description than itself".
In the mean time I've been trying to connect that to more practical levels
in supporting Rick's "two levels of grammar", basically I agree with Rick.
[ Allthough it should be broader than grammar defined as a set of rules ;
it should also be applicable to say the set of all UK placenames etc. ]
I've just received Abiteboul's [et al.] "Data on the Web : from Relations to
Semistructured Data and XML".
It's a very readable and exciting book. Relevant here and now is that he
talks about 'schema extraction' defined as "given one particular data
instance, finding the most specific schema for it".
I know I bored you guys with the emphasisis on the differences between
structured and semi-structured data. But let me quote Abiteboul again "What
sets apart schemas for semistructured data from traditional schemas is the
fact that a given semistructured data instance can have more tha one schema.
This raises the following intriguing possibility : Given a semistructured
data instance for which we do not have any a priori knowledge, compute
automatically some schema for it;of course, given several possible answers,
we want the schema that best describes the structure of that particular
data. We call this problem schema extraction."
This sounds cool to me.
PS Rick I may have some practical pointers for you later. Need some