How to make the following code more efficient?
""" Calculate the posterior distribution of p(pos|word) using Baye's rule: p(pos|word) \propto p(word|pos)p(pos). """ word_pos_dict = defaultdict(float) # Example, word|pos -> 0.0012 pos_dict = defaultdict(float) # pos -> 0.005 pos_word_dict = defaultdict(float) # pos|word -> 0.017 for word_pos, word_pos_prob in word_pos_dict.items(): word, pos = word_pos.split('|') marginal_prob = 0 for pos_prob in pos_dict.values(): marginal_prob += word_pos_prob * pos_prob # Marginal prob of p(word) pos_word_prob = word_pos_prob * pos_dict[pos] pos_word = pos + '|' + word pos_word_dict[pos_word] = pos_word_prob / marginal_prob In practice, the length of word_pos_dict is 57,602, and pos_dict has 984 elements, which make this calculation much slower. Is there something wrong with the implementation, design or algorithm?