| Re: fuzzy searching of a dictionary or use of word names w/ spaces |
|
 |
|
 |
|
 |
|
 |
Group: comp.lang.forth · Group Profile
Author: Jeff FoxJeff Fox Date: Dec 5, 2006 19:39
John Passaniti wrote:
> Jeff didn't use "fuzzy logic" in the correct (or at least classical)
> way. What he seems to be interested in isn't "fuzzy logic" (which is
> more about defining membership sets in vague or imprecise terms and the
> operations that derive from that). What I think he's interested in is
> instead inexact string matching algorithms, which could mean anything
> from substring matchers, matchers that allow transpositions, matchers
> that isolate features (such as Soundex), and so on. These aren't "fuzzy
> logic" algorithms as they ultimately produce a binary true/false.
Years ago I once got a very internesting description from Dr.
Montvelishsky about the fuzzy logic in Forth implementation that
he taught in the Russian University system. He had a simple but
powerful implemenation that I thought might be of interest to other
people but that whould be a different thread as it was not a Forth
system based on an unusual dictionary approach. Fuzzy logic is indeed
not just true/false but is given some range of values. In that case it
was just the range of integers with zero being neither true or false
and positive and negative numbers given a variable degree of true and
false.
> The URL you provided doesn't seem to have much to do with what Jeff was
> looking for, as it appears to be about word-based optical character
> recognition systems.
Oddly enough it is pretty close, but so are are the references to
soundex and inexact string matchers. It was of interest and one
of the things I have been following up on this year has been the
Gabor transform recognition work that I found in SIFR. But the match
to what interests me may be somewhat coincedental.
> Personally, I'm looking forward to reading the *why* of Jeff's interest,
My second programming job in California long ago involved
translation of court reporters' shorthand into formatted English
documents. We also did a realtime version used in captioning for
television.
I was new to Forth at the time and could see that what we were doing
with the inexact string matching in our dictionary searches was
different than the strict use of spaces as word separators in the
simple inner loops of Forth interpreters and Forth compilers.
Over the years I have seen Forths that experimented with expanded
use of space as a separator of words such as colorforth. But I have
not seen any Forth systems that even had fuzzy dictionary search
except SavvyPC. It was an interesting AI system that could write
appliations that could handle user typos very intelligently if one
wanted to do that.
I have not seen any Forth systems that would allow one to define
words with ordinary spaces in them such that a variable degree of
match would be used in the dictionary instead of true/false.
"DUP" and
"DUP DROP"
could be ordinary words in the dictionary
that would include spaces in ordinary Forth dictionary entries
and would find the longer string and execute or compile the
code associated with the alternate longer string because strings
in the dictionary would have a variable degree of match to the
input stream. In the case above the substring "DUP" in the
string "DUP DROP" would match the dictionary entry of "DUP"
exactly but only with 3 characters while the sequence
"DUP DROP" would match a string exactly but with a match
value of 8.
I did say previously that I am not interesting in fuzzy
dictionary lookup to fix programmer typos. ;-) While SavvyPC
could do this it could be a real problem if it were done in
a dumb way or abused.
If a programmer wrote code with words that matched in a fuzzy
way, and didn't heed compiler warnings that they had code with
inexact matches they would be asking for trouble in the future
when they got unintentional and incorrect matches that
produced bugs. The programmer needs to be in control and
be repsonsible for what happens.
> since any scheme to allow dictionary lookups based on inexact string
> matches would seem to me to violate one of the fundamental ideals in
> Forth: That the programmer is in control and responsible for what
> happens.
Best Wishes
|