[
Lists Home |
Date Index |
Thread Index
]
Thanks Ken. Comments inline.
From: Ken North [mailto:kennorth@sbcglobal.net]
>> So where we do understand how the vector model
>> works for text analysis, do we understand how to apply
>> it to a *text* that includes video and audio as integral
>> parts of the *text* and can we combine these into a
>> higher level space vector term
>1. "Facilitating Video Access by Visualizing Automatic Analysis"
>http://www.fxpal.com/publications/FXPAL-PR-99-045.pdf
>"Metadata for video materials can be derived from the analysis of the audio
and
>video streams. For audio, we identify features such as silence, applause,
and
>speaker identity. For video, we find features such as shot boundaries,
>presentation slides, and close-ups of human faces."
This comes closest to what I would consider a good start as it uses gestural
significators. It is in combination with other vocabularies that I think
there is more bang for the buck. If similarity metrics apply across the
vocabularies (a gesture in any vocabulary gets the same vm signature),
then the cues all reinforce each other and the intepretation probability
goes up. Of course, the problem of interpretation starts as soon as we
assign metadata, begin to reason over that and it amplifies (self
reinforcing
assumptions - or GIGO). The reason for comparison is to detect superstition
or
simply, garbage in the metadata induced by faulty observation. Tough
problem.
>2. Yahoo has recently taken the RSS approach. Video RSS provides a text
>description such as height, width, bitrate and running time:
>http://www.webservicessummit.com/Channels/WebServicesSummitAudioVideo.rss
Which is ok for knowing something about the coffee cup but not the coffee.
>3. SQL implementations such as DB2 UDB support content-based querying over
rich
>types. DB2 has an Image Extender and Audio Extender with correspondiong
types
>(DB2IMAGE, DB2AUDIO). The Audio Extender analyses the content and stores
values
>such as whether it's 16-bit audio, samples per second, playing time, the
number
>of clock ticks per quarter note and so on. The Image Extender stores
information
>that enables you to provide an image and search for matches based on color
and
>texture (contrast, directionality, etc.).
>IBM's CueVideo software uses speech recognition technology to generate text
from
>the audio tracks of videos -- which could then be fed into an engine that
uses
>the vector space model and textual similarity matching described in my
previous
>message:
>http://www.almaden.ibm.com/projects/data/CueVideo.pdf
Yes. A useful source. I suspect aided by the human eye, this is good. We
have
similar products (Video Analyst) that also enhance images.
>4. This paper discusses analysis of digital music using similarity
matrices.
>Media Segmentation using Self-Similarity Decomposition
>http://www.fxpal.com/people/cooper/Papers/SPIE02.pdf
>"....In this example, the sequence ABC is asignificantly shorter summary
>containing essentially all the information in the song."
As a songwriter and producer, I can refute that. Let's just
say that we work our bunnies off to make that wrong, but sometimes don't.
The self-similarity of music is a cosmic d'oh. "To play with only this
and that old hat is such a bore, but I sadly fear the love of the ear is
to hear what it heard before." Digital editing makes it easy to grind
out a self-similar production and it seduces one into not doing it by
eliminating serendipitous opportunities and the fecundity of breath.
>3.1. Clustering via similarity matrix decomposition
>To cluster the segments, we factor a segment-indexed similarity matrix to
find
>repeated or substantially similar groups of segments."
Which means the musician/songwriter succeeded and failed.
Much of what I've learned over the years does come from my night gig, and
yes,
similarity plays a big role. On the other hand, so does 'new and different'
and there is the length and tension of the performer's tightrope. We depend
on
that for repeat business and it depresses us because it isn't just one thing
after another, but the same thing after another. whinge....
len
|