Python issubset performance comparing with finding element in set

Question

I was doing a programming exercise on Leetcode (500 Keyboard Row) and I came across set.issubset() in discussions. Upon comparing with my own answer, here's an interesting finding:

1)

def checkWord(self, word): r1 = 'qwertyuiop' r2 = 'asdfghjkl' r3 = 'zxcvbnm' row = 0 for idx, ch in enumerate(word): if idx == 0: row = 1 if ch in r1 else 2 if ch in r2 else 3 continue coming_row = 1 if ch in r1 else 2 if ch in r2 else 3 if row != coming_row: return False return True

If I submit with this, I get a runtime of 40ms

2)

def checkWord(self, word): r1 = 'qwertyuiop' r2 = 'asdfghjkl' r3 = 'zxcvbnm' return set(word).issubset(r1) or set(word).issubset(r2) or set(word).issubset(r3)

If I use issubset(), I get a runtime of 36ms, which beats 94.87% of all submissions.

3) So I thought, hmmm, what's the difference with this piece? According to a question I found on this topic The complextiy of Python issubset(), the time complexity of method 2) should be O(len(word)), which I believe should be the same with this piece:

def checkWord(self, word): r1 = set('qwertyuiop') r2 = set('asdfghjkl') r3 = set('zxcvbnm') row = 0 for idx, ch in enumerate(word): if idx == 0: row = 1 if ch in r1 else 2 if ch in r2 else 3 continue coming_row = 1 if ch in r1 else 2 if ch in r2 else 3 if row != coming_row: return False return True

However, the runtime of this is 32ms and beats 100% of all py3 submissions... Why is issubset() slower than this? Thanks!

Fernando Ortega · Accepted Answer · 2018-04-11 07:51:56Z

Notice that your first version of checkWord function doesn't check if ch value is in r3 set, so you avoid one check.

Furthermore, in your second version of checkword you create a set with the word parameter's characters three times. Why don't you store the set before checking with 'issubset'? Your code will be something like this:

def checkWord3(word): r1 = 'qwertyuiop' r2 = 'asdfghjkl' r3 = 'zxcvbnm' s1 = set(word) return s1.issubset(r1) or s1.issubset(r2) or s1.issubset(r3)

This is another solution using regular expressions:

r1 = re.compile("^[qwertyuiop]+$") r2 = re.compile("^[asdfghjkl]+$") r3 = re.compile("^[zxcvbnm]+$") r1.match(word) or r2.match(word) or r3.match(word)

Sraw · Accepted Answer · 2018-04-13 08:27:36Z

I have tested the following methods:

import random import string def checkWord1(word): r1 = 'qwertyuiop' r2 = 'asdfghjkl' r3 = 'zxcvbnm' row = 0 for idx, ch in enumerate(word): if idx == 0: row = 1 if ch in r1 else 2 if ch in r2 else 3 continue coming_row = 1 if ch in r1 else 2 if ch in r2 else 3 if row != coming_row: return False return True def checkWord2(word): r1 = 'qwertyuiop' r2 = 'asdfghjkl' r3 = 'zxcvbnm' return set(word).issubset(r1) or set(word).issubset(r2) or set(word).issubset(r3) def checkWord3(word): r1 = 'qwertyuiop' r2 = 'asdfghjkl' r3 = 'zxcvbnm' r = set(word) return r.issubset(r1) or r.issubset(r2) or r.issubset(r3) def checkWord4(word): r1 = set('qwertyuiop') r2 = set('asdfghjkl') r3 = set('zxcvbnm') row = 0 for idx, ch in enumerate(word): if idx == 0: row = 1 if ch in r1 else 2 if ch in r2 else 3 continue coming_row = 1 if ch in r1 else 2 if ch in r2 else 3 if row != coming_row: return False return True

And measure execution time:

word = ''.join(random.choice(string.ascii_lowercase) for _ in range(random.randint(0, 9))) print("One:") %timeit -n 10000 checkWord1(word) print("Two:") %timeit -n 10000 checkWord2(word) print("Three:") %timeit -n 10000 checkWord3(word) print("Four:") %timeit -n 10000 checkWord4(word)

Get results:

One: 475 ns ± 88.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Two: 708 ns ± 117 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Three: 552 ns ± 19.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Four: 1.19 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Although with different randomly generated string, the results are similar.

One is fastest, Three is faster and more stable than Two, Four is worst.

The reason I think why One is fastest is that

issubset(object) actually implicitly convert object to a set. That is costly.
This comparison is unfair. Suppose we have one word "qsdfwe", in method One, it will be check if in r1, and in the following loop, if it is found not a sub set of r1, there is also no possibility that it can be a sub set of others.

So I implement a even faster method, I think we can just focus on one row which is decided by the first char as these three rows are exclusive so that we do not check others any more.

def checkWord5(word): r = ['qwertyuiop', 'asdfghjkl', 'zxcvbnm'] first_char = word[0] row = r[0] if first_char in r[0] else r[1] if first_char in r[1] else r[2] for ch in word[1:]: if ch not in row: return False return True

Get results:

One: 728 ns ± 236 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Two: 2.08 µs ± 55.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Three: 1.43 µs ± 60 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Four: 1.45 µs ± 7.74 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Five: 374 ns ± 7.01 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Collectives™ on Stack Overflow

Python issubset performance comparing with finding element in set

2 Answers 2

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Linked

Related