OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] More on Vector Models

[ Lists Home | Date Index | Thread Index ]

On Wed, 4 May 2005 3:39 am, Bullard, Claude L (Len) wrote:
> Not at all.  There is no implied or explicit profit
> motive to 'move on'.  There are
> problems that markup doesn't solve, but possibly
> there are also old solutions that can improve markup.
> That is why I was querying Steve DeRose.  His is a
> world class mind with lots of experience in this
> and other fields.  From time to time, the idea of
> combining vector techniques with markup comes up.
> Bosworth's presentation is another stimulus.

ok, well I'm lost. Vectors are a simple mathematic
paradigm. How do they apply to xml? or is it just
a new type of marketing speek?

> 1.  Vector space models are old.  (See Salton et al).
> VSM technologies incorporate a set of techniques
> that have been refined over the years to enable such
> things as normalization, increased use of probability,
> relaxed constraints on term independence, use of the
> document vectors to get relevance feedback, etc.

I've obviously been out doing other things... :-)

> 2. One doesn't move on to the next big thing.  One
> looks at the data environment and builds systems that
> cope with what is as is and then possibly, pushes it
> to be otherwise.


> Again, in the record systems I see, there is far more
> unstructured text data than any other kind.  

What kind of record systems are they?

> XML is a step
> above the level of 'bag o' words' which is the level
> where VSM thrives.   The question is, given VSM and
> 'bag o' words', when should one move on to markup?

Good question.

Traditionally, that would have been after hundreds of 
thousands of dollars went into that IT project that didn't
quite work. And it was "implementation" time.

But if you have something like an Accounting system
that runs on XML, maybe the time is when you need
to do that next customer invoice.

As a contractor, it always seemed dumb to me that
big companies would spend lotsa money on some
sort of business 'communications' system, but when
you wanted to put in your own invoice to get paid, 
there was no easy system to do it.

And then they all got fed up, and that was the end
of the IT boom. ca-boom as they say.

As IT workers, we are now left with all these medium 
sized enterprises, as opposed to large ones. That
the University graduates don't want to go and work
for because they don't give the big bucks (like we
used to be able to easily get).

But conversely, these medium sized orgs are much
less rigid than their larger counterparts. They'll give
any sort of xml technology a go provided it works.

So I think the IT world has moved on. It's a far more
interesting place than it was even in 1999 I would

I'm yet to figure out vector models, but it sounds
like what people used to refer to as "document
indexing". Maybe I haven't caught up with the
latest terminologies.


Computergrid : The ones with the most connections win.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS