xml-dev - Re: [xml-dev] [OT] Looking for a text algorithm

Re: [xml-dev] [OT] Looking for a text algorithm

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] [OT] Looking for a text algorithm
From: Miles Sabin <miles@milessabin.com>
Date: Sun, 9 Mar 2003 13:09:46 +0000
In-reply-to: <15977.64423.65785.440495@megginson.com>
References: <15977.64423.65785.440495@megginson.com>

David Megginson wrote,
> I'm looking for references to a specific kind of text algorithm --
> the algorithm should generate a number (say, 32 or 64 bits) for any
> text string of any length, similar to a hash.  However, it should be
> possible to compare the numbers for different strings to tell how
> close they are to each other.  For example, the numbers for
>
> 1. To be or not to be.
>
> 2. Two bees or not two bees.
>
> 3. I don't know whether to be or not to be.
>
> should indicate that three strings are relatively close to each other
> (while a hash number would give no indication at all).

Umm ... define "close".

Judging from your examples it looks like you're after a closeness 
criterion derived from longest common subsequences. But I don't see how 
you could use that to usefully construct a single characteristic number 
for _any_ string of _any_ length: with only 32 or 64 bits to play with, 
many many completely unrelated (on any criterion) strings will collide 
on the same code.

Cheers,


Miles

Follow-Ups:
- Re: [xml-dev] [OT] Looking for a text algorithm
  - From: David Megginson <david@megginson.com>

References:
- [OT] Looking for a text algorithm
  - From: David Megginson <david@megginson.com>

Prev by Date: Re: [xml-dev] Schemas as Promises and Expectations
Next by Date: Re: [xml-dev] [OT] Looking for a text algorithm
Previous by thread: Re: [xml-dev] [OT] Looking for a text algorithm
Next by thread: Re: [xml-dev] [OT] Looking for a text algorithm
Index(es):
- Date
- Thread