I have been running this blog for over 6 years at this point, and my posts have plenty of misspelled words. I wanted to come up with a quick way to run a spell checker against all my posts and find the misspelled words. However, given that most of my posts are technical, spell checkers are not good at knowing if a technical term, command, or log output is actually a misspelled word, so when I ran
aspell there were far too many misspelled words. To reduce the number misspelled words that were not actually misspelled, I first needed to generate my own dictionary to use in addition to the default
Generate a Custom aspell Dictionary
Change into the directory that contains all of your posts and run the following command (
--ignore 2 ignores any word that is two characters or less):
for POST in *.md do cat $POST | aspell list --ignore 2 done | sort | uniq
This will loop through every post and output every word to your terminal that
aspell thinks is misspelled. There will be plenty of duplicate words, so the above command will also
sort the output and pipe the sorted output to
uniq to have a de-duplicated list of words. Save or pipe the final output to a text file. I named mine aspell-technology-dictionary.txt.
This is the manual part of the process. Open the text file in your favorite text editor and manually scroll through it to remove words that you know are actually misspelled. For example, VMWare is an incorrect spelling, but VMware is a correct spelling, so I would remove the word VMWare from the text file.
Once you are finished, scroll to the very top of the file and add the following line:
personal_ws-1.1 en 0
aspell uses this for parsing purposes. Save the file.
My Custom aspell Dictionary
If you are interested, here is my custom generated aspell dictionary.
Find Misspelled Words with aspell
Finally, use the custom dictionary with the following command:
for POST in *.md do echo $POST echo cat $POST | aspell list --add-extra-dicts=aspell-technology-dictionary.txt --ignore 2 echo done
This will provide a list of all your posts and any words
aspell thinks are misspelled. You can then manually open each post to fix misspelled words.
Alternatively, you can go through each post in interactive mode with the following command:
for POST in *.md do aspell check --add-extra-dicts=aspell-technology-dictionary.txt --ignore 2 $POST done