I have a dictionary in some code which maps a key to a word, the key is the result of an md5 hash. I have code that essentially wants to get the key for a word, and when it doesn't already exist, add it to the dictionary
Here was my first implementation:
key = int(hashlib.md5(word).hexdigest(), 16) if key in self.id_to_word.keys(): assert word == self.id_to_word[key] else: self.id_to_word[key] = word return key After profiling my code I found this to be EXTREMELY slow. So then I tried this, which is functionally equivalent
key = int(hashlib.md5(word).hexdigest(), 16) try: assert word == self.id_to_word[key] return key except KeyError: self.id_to_word[key] = word This turned out to be incredibly faster. While I'm certainly happy about the performance improvement, I was wondering if someone could explain to me why. Is it bad practice to check for something in a keys() function from a dictionary like that? Is it generating copies of that every time (wasting a lot of computation)?
key in some_dict, don't have to use.keys()instatement searches all those keys, an O(N) operation.key in dictis O(1) operation