When I was in high school, I always had fun finding exceptions the “I before e, except after c” rule. I found over a hundred on my own back then just by searching around the Internet. Today I decided to take a better approach and wrote up a small script in to do the work for me. All in all, it found 533 exceptions to this rule.
Yeah, that’s quite a few. If you’re interested in how I did it, continue reading. Don’t worry, I won’t be sad if you came here just for the list.
The Rule, Broken Down
- If the word contains “ei”, it must be in the form of “cei” or make a long-A sound.
- If the word contains “ie”, it cannot be in the form of a “cie”.
The Tools Used
- The Python Programming Language
- Regular Expressions
- A file containing all English words [ get here ]
Writing up the code is pretty simple. Basically all it does is open the dictionary file and check each word against the rule. If it doesn’t follow the rule, it logs it in the output file.
import re # Open the input & output file streams input_handle = file("dictionary.txt", "r") output_handle = file("i_before_e.txt", "w") # Regexes for removal of valid bad_ei_regexes = [ # Valid case by default re.compile('cei'), # Valid case with a long-A sound re.compile('^(z|p)ei'), re.compile('(?<!--c|s)hei'), re.compile('rein'), re.compile('vei[nl]'), re.compile('eig(h|n|e)') ] # Iterate through all English words for line in input_handle: # I comes before e, even when after a c if(line.find('cie') != -1): output_handle.write(line) continue # Remove all of the long-A sounds for regexc in bad_ei_regexes: line = regexc.sub('', line) # E comes before I, without a C, and without long-A sound if(line.find('ei') != -1): output_handle.write(line) # We're done with the files input_handle.close() output_handle.close()
Pretty simple, eh? We let Python and regular expressions do all of the work!
Bonus: Words that have Qs that aren’t followed by Us
Scrabble players rejoice! I just compiled a list of words that contain a Q that isn’t followed by a U. I did it in the same way that I found exceptions to the “I before E, except after C” rule. This was just for fun. I don’t even play Scrabble.
Click here to view the list. (There are 29 words total)