How to write cache function or equivalent in Python?

Question

Question from Daily Coding Problem 210 as reproduced below:

A Collatz sequence in mathematics can be defined as follows. Starting with any positive integer:

if n is even, the next number in the sequence is n / 2 if n is odd, the next number in the sequence is 3n + 1

It is conjectured that every such sequence eventually reaches the number 1. Test this conjecture.

Bonus: What input n <= 1000000 gives the longest sequence?

My code (note that it is incomplete):

def collatz(n): sequenceLength = 0 while (n>=1): if (n==1): break # solution is found elif (n%2==0): n = n/2 sequenceLength += 1 else: n = 3*n+1 sequenceLength += 1 return(sequenceLength) def longest_seq(limit): result = [] for i in range(1, limit+1): result += [collatz(i)] print(result) return max(result)

Question: In this case, I need to test the conjecture that I will always reach "1" for all positive integers. However, my code above assumes this which means I could potentially run an infinite loop.

What is a good/elegant method to test the conjecture? I was thinking something along the lines of a cache/array to see if values of "n" are repeated at the beginning of the while loop. If so, it suggests an infinite loop. However, I am a bit stuck on the syntax part and not clear on the examples I've seen so far. I just need a way to: - Add things to a cache/equivalent - Check whether something exists within the cache/equivalent (and this either gives me a proper return or an elegant invalid response that I can use, without crashing my program)

Much thanks.

Change n/2 to n//2. In Python 2 they're the same, but in Python 3 / will convert to floating point, while // will give you integer division, which is what you want. — Tom Karzes
– Tom Karzes, Commented Jul 1, 2019 at 7:59

vlemaistre · Accepted Answer · 2019-07-01 09:49:44Z

Since every time you encounter a specific number n_i you will do the same operation, you know that if you encounter a number that you have already seen then you will loop infinitely.

One way to solve this is to save your sequence. Then you can verify at each step that you haven't already encountered the number. Here is what it could look like :

def collatz(n): sequence = [] while (n>=1): if n in sequence: break else: sequence.append(n) if (n==1): break # solution is found elif (n%2==0): n = n/2 else: n = 3*n+1 return(sequence)

Note : You can use a set instead of an array if you want the code to run faster since arrays have longer lookup time (credit to @tobias_k). But you will lose the exact order of your sequence.

Plus one for answering the actual question, but a set would be better, as [] has O(n) lookup time.
I didn't think about set, thanks for the tip. But since we are working with sequences I think I will keep an array since it allows us to have access to the exact sequence. I added a note to my answer to add your comment since it can be useful if you want fast computations
Yes, the list has the benefit that you can return the actual Collatz Sequence, but (a) OP does not seem to need that at all, and (b) if you do so, you can just as well drop the sequenceLength variable.

Thibault D. · Accepted Answer · 2019-07-01 09:48:41Z

There is a built-in decorator for this purpose in python: lru_cache. But first you should use a recursive function to benefit from decorators.

from functools import lru_cache @lru_cache() def collatz(n): sequenceLength = 0 if n == 1: return n elif n % 2 == 0: return 1 + collatz(n // 2) else: # n % 2 == 1: return 1 + collatz(3 * n + 1)

You can check here how it works and modify it to do what you want: https://docs.python.org/3/library/functools.html

You want to check if an input has already been seen before, you can define your own decorator:

def deco(f): cache = {} @wraps(f) def wrapper(*args, **kwargs): if 'r' in kwargs and kwargs['r']: # reset the cache when first called cache.clear() try: res = cache[args] # We have already seen these parameters ! print('cache hit', *args) if res is None: raise KeyError except KeyError: cache[args] = None # temporary store a value here res = f(*args) cache[args] = res # final value stored return res return wrapper

You just need to change your definition of collatz:

@deco def collatz(n, /): # force positional argument here # ... (same code here)

And call it with a reset:

collatz(10, r=True) >>> 7

But as tobias said, this code should never trigger for collatz function

Note that this is not actually what OP is asking, if you read the last paragraph of the question...
@pLOPeGG: Thanks for your answer. I came across something like you have written. I believe it's a form of memoization, with cache being another possibility?
Memoization is another word for cache applied to speed up recursion IMO.

nicola · Accepted Answer · 2019-07-01 08:27:43Z

Recursion + caching:

cache = {1:0} def collatz(n): if n in cache: return cache[n] else: if n%2==0: m = n//2 else: m = 3*n+1 res = collatz(m) + 1 cache[n] = res return res def longest_seq(limit): result = [] for i in range(1, limit+1): result += [collatz(i)] return max(result)

Then running:

r = longest_seq(1000000) #524

Find the value corresponding to the max:

[x for x,y in cache.items() if y==r] #[837799]

Thanks Nicola. This was the second part I would have looked for!

Zefick · Accepted Answer · 2019-07-01 08:06:03Z

cache = {} def collatz(n): sequenceLength = 0 while (n>=1): if n in cache: # number already encountered return sequenceLength + cache[n] if (n==1): break # solution is found elif (n%2==0): n = n/2 sequenceLength += 1 else: n = 3*n+1 sequenceLength += 1 return sequenceLength def longest_seq(limit): result = [] for i in range(1, limit+1): c = collatz(i) result.append(c) cache[i] = c # put the answer in the cache print(result) return max(result)

Ajama · Accepted Answer · 2019-07-01 08:16:33Z

Due to collatz conjecture always ending with 1 I don't see the need to do cached equivalence check. But extra checking never hurts therefore you could just do the simplest ternary check n1 = n if n1 != n else exit() assigning the variables n1 = 0 for the first round before.

tobias_k · Accepted Answer · 2019-07-01 09:45:22Z

As already noted, you can use functools.lru_cache to add caching to pretty much any function (with hashable parameters), but in it's current form, caching won't help much with your collatz function, as you call it just once for any parameter. Thus you have a lot of cached values, but you never use any of those. Instead, you could change your function from iterative to recursive to make use of caching:

@functools.lru_cache() def collatz(n): if n <= 1: return 0 if n % 2 == 0: return 1 + collatz(n // 2) else: return 1 + collatz(3*n + 1)

This way, you can reuse a once-calculated result in a later call of collatz if you merge into a "branch" you already explored in an earlier call (there are some nice graphs e.g. in the Wikipedia article of the conjecture.

Note that depending on the size of your input, you might run into the recursion limit. To "fix" this, you could use sys.setrecursionlimit(somelargenumber) to increase the recursion limit, but of course this only works up to some point.

However, this seems not to be what you actually asked for (and also does not significantly speed up the code for finding the longest sequence up to 1,000,000). Instead, it seems like you want to check if you find a "loop" during one call of the collatz function. Here, lru_cache won't help. Instead, you should just add a set of already seen values of n and see whether the current number is already in that set. However, I would not expect this code to ever be triggered...

def collatz(n): sequenceLength = 0 seen = set() while (n>=1): if n in seen: print("BREAKING NEWS! COLLATZ CONJECTURE DISPROVEN!") break seen.add(n) # remainder of your code

Collectives™ on Stack Overflow

How to write cache function or equivalent in Python?

6 Answers 6

4 Comments

3 Comments

1 Comment

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

4 Comments

3 Comments

1 Comment

Comments

Comments

Comments

Related