Spelling suggestions for common words - ispell, etc.
  Home FAQ Contact Sign in
comp.lang.perl.misc only
 
Advanced search
POPULAR GROUPS

more...

comp.lang.perl.misc Profile…
 Up
Spelling suggestions for common words - ispell, etc.         


Author: sftriman
Date: Apr 3, 2008 02:47

I am looking for a way to, without custom defining a dictionary, to
get a list of suggested words for a misspelled word. Or better, "the"
most likely intended word for a misspelled word.

My base case to consider is:

dmr wjite saddle

which refers to a brand (DMR) and color (white) of a bike part
(saddle).

Ideally, dmr would return no suggestion, and wjite would return the
string "white" though I could certainly understand why "write" is
equally good a suggestion. I would be willing to define an add-on
dictionary to ignore certain words, such as brands and abbreviations
which are known to me, such as DMR, so that is possible to handle.

ispell -a yields:
Show full article (1.55Kb)
4 Comments
Re: Spelling suggestions for common words - ispell, etc.         


Author: David Filmer
Date: Apr 3, 2008 12:29

sftriman wrote:
> get a list of suggested words for a misspelled word. Or better, "the"
> most likely intended word for a misspelled word.

Ever notice that Google does a pretty good job of that? So consider
Net::Google::Spelling:
http://search.cpan.org/~bstilwell/Net-Google-1.0.1/lib/Net/Google/Spelling.pm

--
David Filmer (http://DavidFilmer.com)
no comments
Re: Spelling suggestions for common words - ispell, etc.         


Author: Joost Diepenmaat
Date: Apr 3, 2008 12:34

sftriman yahoo.com> writes:
> I am looking for a way to, without custom defining a dictionary, to
> get a list of suggested words for a misspelled word. Or better, "the"
> most likely intended word for a misspelled word.

You may find this article interesting:
http://norvig.com/spell-correct.html

You still need a list of "good" words, of course.

--
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/
no comments
Re: Spelling suggestions for common words - ispell, etc.         


Author: Ben Bullock
Date: Apr 3, 2008 23:27

On Apr 3, 6:47 pm, sftriman yahoo.com> wrote:
> I am looking for a way to, without custom defining a dictionary, to
> get a list of suggested words for a misspelled word. Or better, "the"
> most likely intended word for a misspelled word.
> from which I could easily pass on the dmr suggestions, but, scoring
> and evaluating the suggestions for wjite is harder. "white" and
> "write" are 'ranked' (I guess) 3rd, 4th, and 7th.

One thing which might help you rank the strings is the "Levenshtein
distance". This gives you the "difference" between two strings as a
number. I don't know if it is on CPAN but there is a module found
here:

http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/index.html

The documentation is here:

http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/Levenshtein.html

Presumably the string with the smallest Levenshtein distance from the
input string would be the most likely candidate for the spelling
checker, although some very rare words might have small distances.
no comments
Re: Spelling suggestions for common words - ispell, etc.         


Author: Ted Zlatanov
Date: Apr 4, 2008 08:58

On Thu, 3 Apr 2008 23:27:56 -0700 (PDT) Ben Bullock gmail.com> wrote:
BB> On Apr 3, 6:47 pm, sftriman yahoo.com> wrote:
>> I am looking for a way to, without custom defining a dictionary, to
>> get a list of suggested words for a misspelled word. Or better, "the"
>> most likely intended word for a misspelled word.
>> from which I could easily pass on the dmr suggestions, but, scoring
>> and evaluating the suggestions for wjite is harder. "white" and
>> "write" are 'ranked' (I guess) 3rd, 4th, and 7th.
BB> One thing which might help you rank the strings is the "Levenshtein
BB> distance". This gives you the "difference" between two strings as a
BB> number. I don't know if it is on CPAN but there is a module found
BB> here:
BB> The documentation is here:
BB> Presumably the string with the smallest Levenshtein distance from the
BB> input string would be the most likely candidate for the spelling
BB> checker, although some very rare words might have small distances.
Show full article (1.54Kb)
no comments

RELATED THREADS
SubjectArticles qty Group
Bug#458445: dictionaries-common: ispell.el overrides upstream ispell.ellinux.debian.bugs.dist ·
Re: Standardised English spelling (was Standardized English Spelling)soc.genealogy.britain ·