Why initialize Set from dict.keys() is faster than using set.add()?

Question

Code

total_exc_time = 0 for _ in range(10): start = time.time() set_a = set() dict_a = dict() add = set_a.add for index in range(1000000): dict_a[index] = index add(index) total_exc_time += ((time.time() - start) * 1000) total_exc_time = 0 for _ in range(10): start = time.time() dict_a = dict() for index in range(1000000): dict_a[index] = index set_a = set(dict_a.keys()) total_exc_time += ((time.time() - start) * 1000)

Result

332.3066234588623 ms 283.2982540130615 ms

isn't the time complexity for the first code is O(n) and later code is 2 times O(n)?

Function calls have overhead. Any overhead multiplied by ten million will becomes more noticeable. Idk if that's it, but it's a possibility. — Carcigenicate
– Carcigenicate, Commented Nov 19, 2020 at 0:04
"the time complexity for the first code is O(n) and later code is 2 times O(n)" - that's not how asymptotic complexity works. — user2357112
– user2357112, Commented Nov 19, 2020 at 0:12
sorry, I know the have the same O(n) but, the first code looks that it has less computation than the later code. — bakumpai
– bakumpai, Commented Nov 19, 2020 at 0:18

anch2150 · Accepted Answer · 2020-11-19 00:21:26Z

1

They are equivalent. The heavy lifting here to calculate the key hash. In both, it happens twice: add to a dict and then add to set. Other procedures should be light. You can use debugger to show the C calls to verify.

In addition to function call overhead mentioned by Carcigenicate, memory management might also play a role. If the set knows the length, it could possibly avoid copying data over when the predefined space isn't enough.

edited Nov 19, 2020 at 0:21

answered Nov 19, 2020 at 0:17

anch2150

916 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

bakumpai Over a year ago

doesn't dict.keys() populate a list? and it costs O(n)?

juanpa.arrivillaga Over a year ago

@bakumpai no it doesnt. Even if it did it could still be faster to pass it to set than using .add in a loop. For starters, even if it did the same thing (not pre-allocating the entire set) it would still be faster because it is all done in compiled code, whereas your loop is at the interpreter, bytecode level, which will always be significantly slower

anch2150 Over a year ago

No. dict.keys() are views. docs.python.org/3/library/stdtypes.html#dict.keys

Carcigenicate Over a year ago

I didn't even think of the potential for reallocations/expansion. I don't think it can take that into consideration because the signature for set specifies an "iterable", which doesn't have a length, but it could do an attribute check or something regardless.

anch2150 Over a year ago

@Carcigenicate have to admit I didn't verify myself and it's hypothetical.

|

Collectives™ on Stack Overflow

Why initialize Set from dict.keys() is faster than using set.add()?

1 Answer 1

10 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Related