ThorneLabs

Spell Checking Many Posts with aspell and a Custom Dictionary

• Updated February 24, 2019


I have been running this blog for over 6 years at this point, and my posts have plenty of misspelled words. I wanted to come up with a quick way to run a spell checker against all my posts and find the misspelled words. However, given that most of my posts are technical, spell checkers are not good at knowing if a technical term, command, or log output is actually a misspelled word, so when I ran aspell there were far too many misspelled words. To reduce the number misspelled words that were not actually misspelled, I first needed to generate my own dictionary to use in addition to the default aspell dictionaries.

Generate a Custom aspell Dictionary

Change into the directory that contains all of your posts and run the following command (--ignore 2 ignores any word that is two characters or less):

for POST in *.md
do
    cat $POST | aspell list --ignore 2
done | sort | uniq

This will loop through every post and output every word to your terminal that aspell thinks is misspelled. There will be plenty of duplicate words, so the above command will also sort the output and pipe the sorted output to uniq to have a de-duplicated list of words. Save or pipe the final output to a text file. I named mine aspell-technology-dictionary.txt.

This is the manual part of the process. Open the text file in your favorite text editor and manually scroll through it to remove words that you know are actually misspelled. For example, VMWare is an incorrect spelling, but VMware is a correct spelling, so I would remove the word VMWare from the text file.

Once you are finished, scroll to the very top of the file and add the following line:

personal_ws-1.1 en 0

aspell uses this for parsing purposes. Save the file.

My Custom aspell Dictionary

If you are interested, here is my custom generated aspell dictionary.

Find Misspelled Words with aspell

Finally, use the custom dictionary with the following command:

for POST in *.md
do
    echo $POST
    echo
    cat $POST | aspell list --add-extra-dicts=aspell-technology-dictionary.txt --ignore 2
    echo
done

This will provide a list of all your posts and any words aspell thinks are misspelled. You can then manually open each post to fix misspelled words.

Alternatively, you can go through each post in interactive mode with the following command:

for POST in *.md
do
    aspell check --add-extra-dicts=aspell-technology-dictionary.txt --ignore 2 $POST
done