Approximate String Matching
On occasion I find myself searching for something in log files or in my Bash history, but I can’t quite remember what it is that I am looking for. Come to think of it, this happens a lot.
I usually get some of the keywords wrong and those I do get right may be out of order. In other words, grep
just doesn’t cut it. There are a couple of excellent command-line utilities available for just this sort of task – fzf
and agrep
– that specialize in approximate string matching.
The fzf
– or “fuzzy find” – is geared more toward interactive use. The most common way sysadmins use this utility is by running history|fzf
. This allows for very flexible searching of your shell history for commands long forgotten.
But fzf
also offers some non-interactive functionality that can come in handy in scripts. In the example below, we will download Cervantes’ “Don Quixote” and then will try to find the oft-quoted line about battle wounds and honor. I can’t quite remember how it went, so this would be a good test for fzf
.
First, we will download the book to a temp file, while doing some basic text formatting, such as converting text to Unix format and joining some of the sentences to reduce the number of line breaks.
f="$(mktemp)" curl -s0 -k http://www.gutenberg.org/files/996/996.txt > "${f}" cat -v "${f}" | \ sed 's/\^M$//g' | \ sed -r '/[[:alnum:]]$/N;s/\n/ /' | \ sponge "${f}"
And now we can try to find the relevant line:
cat $f | fzf --filter 'battle wounds honor' | \ grep -i 'wounds' | head -1 | grep --color -Ei "wounds|$" # To which Don Quixote replied, "Wounds received in battle confer honour # instead of taking it away; and so, friend Panza, say no more, but, as I
This is awesome. And in case the first hit found by fzf
is not what you were looking form just change head -1
to, say, head -10
and look through the matching lines.
Now, agrep
is quite a bit less flexible (not to mention slower), but it can also work:
agrep -i -k -E 5 'battle wounds honour' $f # To which Don Quixote replied, "Wounds received in battle confer honour # instead of taking it away; and so, friend Panza, say no more, but, as I
Note that I cheated a little by using the British English spelling of “honor”. Unlike fzf
, agrep
is not good with spelling discrepancies.