Lists Home |
Date Index |
Not at all. There is no implied or explicit profit
motive to 'move on'. There are
problems that markup doesn't solve, but possibly
there are also old solutions that can improve markup.
That is why I was querying Steve DeRose. His is a
world class mind with lots of experience in this
and other fields. From time to time, the idea of
combining vector techniques with markup comes up.
Bosworth's presentation is another stimulus.
1. Vector space models are old. (See Salton et al).
VSM technologies incorporate a set of techniques
that have been refined over the years to enable such
things as normalization, increased use of probability,
relaxed constraints on term independence, use of the
document vectors to get relevance feedback, etc.
2. One doesn't move on to the next big thing. One
looks at the data environment and builds systems that
cope with what is as is and then possibly, pushes it
to be otherwise.
Again, in the record systems I see, there is far more
unstructured text data than any other kind. So means
to handle that more effectively are worth investigating.
Innovation on those means is always desirable. If one
really wants to improve the user experience, it pays to
experience it. So where we are stuffing lots of
unstructured text into varchars, being able
to index the contents in a standard way and then mine that
more effectively is a big improvement over Like *string*
Vector space models are a known effective way to do that.
Reading indicates that the techniques are now better than
the last time I looked (eg, the short doc problem is not
a problem, viz is really cheap or free, etc.).
XML is now a part of a broader set of technologies.
Exhausted? Hardly. Exclusive? Not at all. XML is a step
above the level of 'bag o' words' which is the level
where VSM thrives. The question is, given VSM and
'bag o' words', when should one move on to markup?
There are some obvious answers and maybe some not
From: David Lyon [mailto:email@example.com]
but "knowing" and seeing are two different things.
As an example, I know that I should be able to get
an electronic receipt loaded from the service station
into (an accounting system in) my mobile phone when
I go to pay. But seeing that in practice is something
that is yet to happen.
To label it just "data transport" removes any form
of personalisation and connection with a personal
experience. I think that is a major shortcoming.
I doubt that we have had all "the possible" personal
experiences with xml that we could ever imagine.
Just as there is coffee and there is coffee. Even the
customer experience that one can have with a simple
cup of coffee has evolved somewhat over the
last 20 years.
So I would say that there is still room for change yet
over the next twenty years - even in coffee drinking
where one would think that the choices are fairly
> The subtleties are in applications. There can be lots
> of those and there are lots of semantics, but XML is
> blithely ignorant of those. A very high percentage of
> the discussions on this and other lists that talk about
> 'doing XML' are really about 'applying' XML.
Exactly. It's a 'customer experience' thing.
> There are overlapping areas though that should get
> our attention. One of these is indexing and automated
> categorization. Vector models are pretty good at both.
This is out of my field... I actually have no idea what
this is about. Maybe it's the next big thing...
> If you have the vector indices, do you need the markup?
Sounds like the big guns are moving their focus away
from xml onto more potentially profitable pastures.
Maybe xml has been milked to the point where there
are no longer any big and easy profits to be made.
I detect that this is really the question that you are
asking, rather than anything to do with markup itself.
(xml) Markup is an extension of the English language.
It makes sense to use it in applications such as
Accounting systems and other day-to-day systems.
So while it sounds like you might be ready to move
onto bigger and better things, I doubt that the
practical uses of things like xml will be going away