xml-dev - Re: [xml-dev] More on Vector Models

Re: [xml-dev] More on Vector Models - Chicken and Duck talk

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] More on Vector Models - Chicken and Duck talk
From: David Lyon <david.lyon@computergrid.net>
Date: Sat, 30 Apr 2005 09:40:37 +1000
In-reply-to: <15725CF6AFE2F34DB8A5B4770B7334EE07206EBF@hq1.pcmail.ingr.com>
References: <15725CF6AFE2F34DB8A5B4770B7334EE07206EBF@hq1.pcmail.ingr.com>
User-agent: KMail/1.7.1

Hi Len,

Interesting set of concepts. Back to the old Enterprise view (StarTrek that 
is) of trying to understand brave new worlds. Different worlds, cultures and 
beings. When the world was a much more innocent place (not really of course). 

On Sat, 30 Apr 2005 4:52 am, Bullard, Claude L (Len) wrote:
> If the value of indexing is expressed as the function of the
> density of objects in addressable space so that performance
> is inversely proportional to the space density (actually, the
> address space itself), XML vocabularies increase the
> density of the space as well as introducing ambiguity
> and uncertainty through semantic loading and can actually
> hurt the performance of the system. (yes|no ?)

No. there should be no such thing as a performance bottleneck in an enterprise 
xml system. This tends to only happen in larger organisations.

A few years ago I contracted to a telco and worked on integration of payphones 
into their central system.

The big surprise to me was that there was actually five databases that held 
information about payphones in the land and it simply wasn't possible to do a 
"select count(*) from payphones where (status="Active")". Just a simple thing 
but absolutely not possible. 

It was possible to know how many 20c pieces were collected nationwide in a 
week, but not how many phones were in service at any one time.

Anyway, that's just one experience I have of computers in an enterprise at a 
very large scale. It can be a mess and there is no easy way to sort it out. I 
think the majority of xml development has been governed by engineers with 
this sort of experience.

Down at the small business, things are the opposite.

Accounting systems usually store everything and delays in processing are 
usually physical. The time to run up the stairs to the office to check the 
computer.

Ambiguity is always handled by a human mental process and resolved by either 
speaking softly or yelling down the phone at the other party. There's also 
the classic deference strategy of "the cheque is in the mail".

So the two cultures, small organisation and large organisation are 
diametrically opposed. The larger ones are process driven whilst the smaller 
ones are sales driven.

> That's why Bosworth's presentation has merit.  The problem
> however, is that it simply moves the calculation of the similarity
> metric away from the apriori schema declaration into raw
> microparsed vector results. 

Hmm.. I'll have to feed this one to computer....

> A schema is the declaration of a 
> space where occurrence indicators are a determinant of frequency
> and therefore, similarity given a rule that frequent terms are
> less important than rare terms within a document (term vectors),
> and more important across documents (document vectors).

Probably.

In Chinese, they have this expression called "Chicken and Duck talk" where the 
chicken speaks in it's language, and the duck in it's. They are both happy. 

Whilst I never saw this in Star Trek, I think it would make for an interesting 
future episode. Actually "Chicken and Duck talk" describes what is happening 
with xml between large and small enterprises. Neither side really gets what 
the other is saying. 

I hope in the future that these different cultures can be bridged and that xml 
is the path.

Take care...

David

-- 
Computergrid : The ones with the most connections win.

References:
- More on Vector Models
  - From: "Bullard, Claude L (Len)" <len.bullard@intergraph.com>

Prev by Date: [ANN] XML Pipeline Language (XPL) submission now at W3C
Next by Date: Re: [xml-dev] [ANN] XML Pipeline Language (XPL) submission now atW3C
Previous by thread: More on Vector Models
Next by thread: [ANN] XML Pipeline Language (XPL) submission now at W3C
Index(es):
- Date
- Thread