Performance comparison of using `in` vs try except in python

Question

I have a dictionary in some code which maps a key to a word, the key is the result of an md5 hash. I have code that essentially wants to get the key for a word, and when it doesn't already exist, add it to the dictionary

Here was my first implementation:

key = int(hashlib.md5(word).hexdigest(), 16) if key in self.id_to_word.keys(): assert word == self.id_to_word[key] else: self.id_to_word[key] = word return key

After profiling my code I found this to be EXTREMELY slow. So then I tried this, which is functionally equivalent

key = int(hashlib.md5(word).hexdigest(), 16) try: assert word == self.id_to_word[key] return key except KeyError: self.id_to_word[key] = word

This turned out to be incredibly faster. While I'm certainly happy about the performance improvement, I was wondering if someone could explain to me why. Is it bad practice to check for something in a keys() function from a dictionary like that? Is it generating copies of that every time (wasting a lot of computation)?

.keys() creates a list of all the keys, and the in statement searches all those keys, an O(N) operation. key in dict is O(1) operation — reptilicus
– reptilicus, Commented Apr 8, 2015 at 19:05
This discussion doesn't work for you: stackoverflow.com/q/1835756/349420 ? — Esparta Palma
– Esparta Palma, Commented Apr 8, 2015 at 19:08

Daniel · Accepted Answer · 2015-04-08 19:06:29Z

id_to_word.keys() creates a new list, which is linearly searched, which is much slower than a hash lookup. Remove the .keys().

The fastest way would be:

key = int(hashlib.md5(word).hexdigest(), 16) assert word == self.id_to_word.setdefault(key, word)

The only issue with this is that assert should not be used in production code. python -O (the optimize flag) will remove an assert from code. At that time, your code breaks.

Joran Beasley · Accepted Answer · 2015-04-08 19:06:37Z

key in some_dict is much faster than key in some_dict.keys()

Dict Lookup key in some_dict is O(1) complexity so its very fast

that said its still (very marginally)slower in the case where the key is in the dict than just try/except

the real answer is there is no real measurable difference between these 2 methods and do whatever feels right to you

Bakuriu · Accepted Answer · 2015-04-08 19:11:52Z

This is to be expected (in python2). The keys() method returns a list of keys. So using the in operator on the list takes linear time.

Trying to access the item is constant time, which is much faster.

Note: you can simply use key in dictionary instead of the try: ...except:.

Note that dictionaries have a setdefault method that already does what you want. Moreover if you do that operation a lot of time you should consider using collections.defaultdict instead of a plain dictionary.

reptilicus · Accepted Answer · 2015-04-08 20:56:14Z

Elaborating on my comments above:

In [4]: d = {k:k for k in xrange(1000)} In [5]: %timeit 50 in d 10000000 loops, best of 3: 68.6 ns per loop In [6]: %timeit 50 in d.keys() 100000 loops, best of 3: 6.35 µs per loop

as you can see, using d.keys() is about 100 times slower.

AChampion · Accepted Answer · 2015-04-08 19:40:31Z

As mentioned setdefault() solves your problem without an if or try block.
But "Easier to Ask for Forgiveness than Permission" [EAFP] and duck-typing is a common idiom in Python, compared to the more defensive "Look Before You Leap" [LBYL] idiom common in other languages, e.g. Java, C++.

Jeff Knuth has an interesting blog post on it Write Clean Python: Use Exceptions

Collectives™ on Stack Overflow

Performance comparison of using `in` vs try except in python

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Linked

Related