Here's a solution where correctness is defined as: an comes before a word that starts with a vowel sound, otherwise a may be used:
#!/usr/bin/env python import itertools import re import sys try: from future_builtins import map, zip except ImportError: # Python 3 (or old Python versions) map, zip = map, zip from operator import methodcaller import nltk # $ pip install nltk from nltk.corpus import cmudict # >>> nltk.download('cmudict') def starts_with_vowel_sound(word, pronunciations=cmudict.dict()): for syllables in pronunciations.get(word, []): return syllables[0][-1].isdigit() # use only the first one def check_a_an_usage(words): # iterate over words pairwise (recipe from itertools) #note: ignore Unicode case-folding (`.casefold()`) a, b = itertools.tee(map(methodcaller('lower'), words)) next(b, None) for a, w in zip(a, b): if (a == 'a' or a == 'an') and re.match('\w+$', w): valid = (a == 'an') if starts_with_vowel_sound(w) else (a == 'a') yield valid, a, w #note: you could use nltk to split text in paragraphs,sentences, words pairs = ((a, w) for sentence in sys.stdin.readlines() if sentence.strip() for valid, a, w in check_a_an_usage(nltk.wordpunct_tokenize(sentence)) if not valid) print("Invalid indefinite article usage:") print('\n'.join(map(" ".join, pairs)))
Example input (one sentence per line)
Validity is defined as `an` comes before a word that starts with a vowel sound, otherwise `a` may be used. Like "a house", but "an hour" or "a European" (from @Hyperboreus's comment http://stackoverflow.com/questions/20336524/gramatically-correct-an-english-text-python#comment30353583_20336524 ). A AcRe, an AcRe, a rhYthM, an rhYthM, a yEarlY, an yEarlY (words from @tchrist's comment http://stackoverflow.com/questions/9505714/python-how-to-prepend-the-string-ub-to-every-pronounced-vowel-in-a-string#comment12037821_9505868 ) We have found a (obviously not optimal) solution." vs. "We have found an obvious solution (from @Hyperboreus answer) Wait, I will give you an... -- he shouted, but dropped dead before he could utter the last word. (ditto)
Output
Invalid indefinite article usage: a acre an rhythm an yearly
It is not obvious why the last pair is invalid, see Why is it “an yearly”?
an... thats not so hard ... if he wants to actually correct the grammar that is