why this question ?
I was trying to answer this question: Check if All Values Exist as Keys in Dictionary with something better than a generator comprehension fed to all (python loops, even in comprehensions, slow down execution compared to implicit loops that some functions perform):
all(i in bar for i in foo) where bar is a dictionary and foo is a list by using set.issubset (with a conversion to set of foo to be able to use foo.issubset(bar)), and didn't succeed to beat the times of the all solution (unless both containers are converted to sets).
my question:
From the documentation of set:
Note, the non-operator versions of union(), intersection(), difference(), and symmetric_difference(), issubset(), and issuperset() methods will accept any iterable as an argument. In contrast, their operator based counterparts require their arguments to be sets. This precludes error-prone constructions like set('abc') & 'cbs' in favor of the more readable set('abc').intersection('cbs').
Okay but the performance really depends on the type of argument, even if the complexity does not (The complextiy of Python issubset()):
import timeit foo = {i for i in range(1, 10000, 2)} bar = foo - {400} n=10000 x = timeit.timeit(setup="foo = {str(i) for i in range(1, 10000, 2)};bar = foo - {'400'}",stmt="bar.issubset(foo)",number=n) print("issubset(set)",x) x = timeit.timeit(setup="foo = {str(i) for i in range(1, 10000, 2)};bar = foo - {'400'};foo=list(foo)",stmt="bar.issubset(foo)",number=n) print("issubset(list)",x) x = timeit.timeit(setup="foo = {str(i):i for i in range(1, 10000, 2)};bar = set(foo) - {'400'}",stmt="bar.issubset(foo)",number=n) print("issubset(dict)",x) x = timeit.timeit(setup="foo = {str(i):i for i in range(1, 10000, 2)}.keys();bar = set(foo) - {'400'}",stmt="bar.issubset(foo)",number=n) print("issubset(dict_keys)",x) my results (Python 3.4):
issubset(set) 1.6141405847648826 issubset(list) 3.698748032058883 issubset(dict) 3.6300025109004244 issubset(dict_keys) 4.224299651223102 So if a set is passed as the argument, the result is very fast.
Using a list is much slower. I figured out that it was because of the hash that must be done on the strings is costly. So I changed my test inputs with integers like this:
foo = {i for i in range(1, 10000, 2)} bar = foo - {400} and the results were globally faster but still a huge time difference:
issubset(set) 0.5981848205989139 issubset(list) 1.7991591232742143 issubset(dict) 1.889119736960271 issubset(dict_keys) 2.2531574114632678 I also tried to change dict by dict.keys() as in python 3 the keys is said to be (https://www.python.org/dev/peps/pep-3106/) "a set-like or unordered container object".
But in that case, the result is even worse than with dict or list.
So why does passing a set beats passing a list or a dict or a dict_keys object? I don't see anything mentionned in the documentation about this.