OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] More on Vector Models

[ Lists Home | Date Index | Thread Index ]

Not at all.  There is no implied or explicit profit 
motive to 'move on'.  There are 
problems that markup doesn't solve, but possibly 
there are also old solutions that can improve markup. 
That is why I was querying Steve DeRose.  His is a 
world class mind with lots of experience in this 
and other fields.  From time to time, the idea of 
combining vector techniques with markup comes up. 
Bosworth's presentation is another stimulus.

1.  Vector space models are old.  (See Salton et al).  
VSM technologies incorporate a set of techniques 
that have been refined over the years to enable such 
things as normalization, increased use of probability, 
relaxed constraints on term independence, use of the 
document vectors to get relevance feedback, etc.

2. One doesn't move on to the next big thing.  One 
looks at the data environment and builds systems that 
cope with what is as is and then possibly, pushes it 
to be otherwise.  

Again, in the record systems I see, there is far more
unstructured text data than any other kind.  So means 
to handle that more effectively are worth investigating. 
Innovation on those means is always desirable.  If one 
really wants to improve the user experience, it pays to 
experience it.   So where we are stuffing lots of 
unstructured text into varchars, being able 
to index the contents in a standard way and then mine that 
more effectively is a big improvement over Like *string* 

Vector space models are a known effective way to do that. 
Reading indicates that the techniques are now better than 
the last time I looked (eg, the short doc problem is not 
a problem, viz is really cheap or free, etc.).

XML is now a part of a broader set of technologies. 
Exhausted?  Hardly.  Exclusive?  Not at all.  XML is a step  
above the level of 'bag o' words' which is the level 
where VSM thrives.   The question is, given VSM and 
'bag o' words', when should one move on to markup? 
There are some obvious answers and maybe some not 
so obvious.


From: David Lyon [mailto:david.lyon@computergrid.net]

but "knowing" and seeing are two different things.

As an example, I know that I should be able to get
an electronic receipt loaded from the service station
into (an accounting system in) my mobile phone when 
I go to pay. But seeing that in practice is something 
that is yet to happen.

To label it just "data transport" removes any form
of personalisation and connection with a personal
experience. I think that is a major shortcoming.

I doubt that we have had all "the possible" personal
experiences with xml that we could ever imagine.

Just as there is coffee and there is coffee. Even the 
customer experience that one can have with a simple 
cup of coffee has evolved somewhat over the 
last 20 years.

So I would say that there is still room for change yet
over the next twenty years - even in coffee drinking
where one would think that the choices are fairly

> The subtleties are in applications.  There can be lots
> of those and there are lots of semantics, but XML is
> blithely ignorant of those.  A very high percentage of
> the discussions on this and other lists that talk about
> 'doing XML' are really about 'applying' XML.

Exactly. It's a 'customer experience' thing.

> There are overlapping areas though that should get
> our attention.  One of these is indexing and automated
> categorization.  Vector models are pretty good at both.

This is out of my field... I actually have no idea what
this is about. Maybe it's the next big thing...

> If you have the vector indices, do you need the markup?

Sounds like the big guns are moving their focus away
from xml onto more potentially profitable pastures. 

Maybe xml has been milked to the point where there 
are no longer any big and easy profits to be made.

I detect that this is really the question that you are
asking, rather than anything to do with markup itself.

(xml) Markup is an extension of the English language.

It makes sense to use it in applications such as
Accounting systems and other day-to-day systems.

So while it sounds like you might be ready to move
onto bigger and better things, I doubt that the
practical uses of things like xml will be going away
anytime soon.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS