175

What is the the time complexity of each of python's set operations in Big O notation?

I am using Python's set type for an operation on a large number of items. I want to know how each operation's performance will be affected by the size of the set. For example, add, and the test for membership:

myset = set() myset.add('foo') 'foo' in myset 

Googling around hasn't turned up any resources, but it seems reasonable that the time complexity for Python's set implementation would have been carefully considered.

If it exists, a link to something like this would be great. If nothing like this is out there, then perhaps we can work it out?

Extra marks for finding the time complexity of all set operations.

2
  • 9
    While GWW's link is very informative, you can reason about the time complexity of python's sets by understanding that they are simply special cases of python's dictionary (keys, but no values). So, if you know the time complexity of operations on a hash map, you're pretty much there. Commented Sep 8, 2011 at 16:40
  • This link may be useful: ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt Commented May 10, 2022 at 16:41

4 Answers 4

178

According to Python wiki: Time complexity, set is implemented as a hash table. So you can expect to lookup/insert/delete in O(1) average. Unless your hash table's load factor is too high, then you face collisions and O(n).

P.S. for some reason they claim O(n) for delete operation which looks like a mistype.

P.P.S. This is true for CPython, pypy is a different story.

Sign up to request clarification or add additional context in comments.

8 Comments

Set in python also do auto sorting. So do you think inserting new value is still O(1) time complexity
@thakurinbox could you support your statement with a link?
Auto "ordering" not sorting.
@NareshThakur "Set in python also do auto sorting." - Not true. You probably just observed a special case.
A set is most likely to appear to be 'automatically ordered' if it only contains the small positive integers which are pre-allocated when the interpreter starts. Once you add some negative values or larger ints or other data types, all bets are off.
|
24

The other answers do not talk about 2 crucial operations on sets: Unions and intersections. In the worst case, union will take O(n+m) whereas intersection will take O(min(x,y)) provided that there are not many element in the sets with the same hash. A list of time complexities of common operations can be found here: https://wiki.python.org/moin/TimeComplexity

1 Comment

Wow, this was a super helpful answer that just sped up my code by several orders of magnitude, thank you!
19

The operation in should be independent from the size of the container, ie. O(1) -- given an optimal hash function. This should be nearly true for Python strings. Hashing strings is always critical, Python should be clever there and thus you can expect near-optimal results.

Comments

0

Set type in Python is basically implemented as a HashTable.

There're so many great answers above, I'm just gonna state a point that is missing:

I think one more thing is that the size of hashtable is predefined, and adding elements beyond the current capacity triggers a resize of the hash table.

For the complexity part (HashTable):

Amortized:

O(1) for individual additions because resizing is not as often.

Worst case:

O(n) resizing involves rehashing all elements into a larger table, which is a linear operation.

1 Comment

In other words, using set to determine that all elements in a list are the same value takes O(n^2).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.