156

I'm looking for a Python caching library but can't find anything so far. I need a simple dict-like interface where I can set keys and their expiration and get them back cached. Sort of something like:

cache.get(myfunction, duration=300) 

which will give me the item from the cache if it exists or call the function and store it if it doesn't or has expired. Does anyone know something like this?

6
  • i think you're missing item in your example. Commented Sep 15, 2009 at 13:45
  • Yes, this would probably need a key... And, 2.x. Commented Sep 15, 2009 at 13:50
  • 4
    within the same process or shared between processes? threaded or not? Commented Sep 15, 2009 at 14:37
  • 1
    It should be thread-safe, sorry, I should have mentioned. I don't need to share between processes. Commented Sep 18, 2009 at 10:20
  • 7
    Try DiskCache: Apache2 licensed, 100% coverage, thread-safe, process-safe, multiple eviction policies and fast (benchmarks). Commented Mar 21, 2016 at 18:13

15 Answers 15

100

From Python 3.2 you can use the decorator @lru_cache from the functools library. It's a Least Recently Used cache, so there is no expiration time for the items in it, but as a fast hack it's very useful.

from functools import lru_cache @lru_cache(maxsize=256) def f(x): return x*x for x in range(20): print f(x) for x in range(20): print f(x) 
Sign up to request clarification or add additional context in comments.

5 Comments

cachetools offers an nice implementation of these and it's compatible python 2 and python 3.
big +1 for cachetools... seems pretty cool and has a couple more caching algorithms :)
This should never be suggested! Stay compatible.
@roboslone, two years (minus 4 days..) from your comment about not being thread safe, it may have changed. I have cachetools 2.0.0 and I see in the code that it uses an RLock. /usr/lib/python2.7/site-packages/cachetools/func.py
@Motty: The documentation for cachetools 4.0.0.0 says this: "Please be aware that all these classes are not thread-safe. Access to a shared cache from multiple threads must be properly synchronized, e.g. by using one of the memoizing decorators with a suitable lock object" (bold mine)
56

Take a look at Beaker:

2 Comments

Ah, I kept searching for this and all I found was a wiki that mentioned how to use it as an WSGI middleware. It looks like what I need, thank you.
See also dogpile- supposedly the new and improved beaker.
30

You might also take a look at the Memoize decorator. You could probably get it to do what you want without too much modification.

2 Comments

That's clever. A few changes and the decorator could even expire after a set time.
You could definitely write a space-based limit to the cache in the decorator. That would be helpful if you wanted a function to, for example, generate the fibonacci sequence term by term. You want caching, but you only need the last two values - saving all of them is just space inefficient.
16

No one has mentioned shelve yet. https://docs.python.org/2/library/shelve.html

It isn't memcached, but looks much simpler and might fit your need.

1 Comment

I wrote a thread- and multiprocess-safe wrapper for the standard shelve module (including a helper function for caching http requests) in case that is useful for anyone: github.com/cristoper/shelfcache
15

Joblib https://joblib.readthedocs.io supports caching functions in the Memoize pattern. Mostly, the idea is to cache computationally expensive functions.

>>> from joblib import Memory >>> mem = Memory(cachedir='/tmp/joblib') >>> import numpy as np >>> square = mem.cache(np.square) >>> >>> a = np.vander(np.arange(3)).astype(np.float) >>> b = square(a) ________________________________________________________________________________ [Memory] Calling square... square(array([[ 0., 0., 1.], [ 1., 1., 1.], [ 4., 2., 1.]])) ___________________________________________________________square - 0...s, 0.0min >>> c = square(a) 

You can also do fancy things like using the @memory.cache decorator on functions. The documentation is here: https://joblib.readthedocs.io/en/latest/generated/joblib.Memory.html

1 Comment

As a sidenote, joblib really shines when you're working with large NumPy arrays, since it has special methods to deal with them specifically.
9

I think the python memcached API is the prevalent tool, but I haven't used it myself and am not sure whether it supports the features you need.

1 Comment

That one's the industry standard, but all I want is a simple in-memory storage mechanism that can hold 100 keys or so, and memcached is a bit overkill. Thank you for the answer, though.
9
import time class CachedItem(object): def __init__(self, key, value, duration=60): self.key = key self.value = value self.duration = duration self.timeStamp = time.time() def __repr__(self): return '<CachedItem {%s:%s} expires at: %s>' % (self.key, self.value, time.time() + self.duration) class CachedDict(dict): def get(self, key, fn, duration): if key not in self \ or self[key].timeStamp + self[key].duration < time.time(): print 'adding new value' o = fn(key) self[key] = CachedItem(key, o, duration) else: print 'loading from cache' return self[key].value if __name__ == '__main__': fn = lambda key: 'value of %s is None' % key ci = CachedItem('a', 12) print ci cd = CachedDict() print cd.get('a', fn, 5) time.sleep(2) print cd.get('a', fn, 6) print cd.get('b', fn, 6) time.sleep(2) print cd.get('a', fn, 7) print cd.get('b', fn, 7) 

3 Comments

I did something like that, but you need locks for multithreading and a size parameter to avoid it growing infinitely. Then you need some function to sort the keys by accesses to discard the least-accessed ones, etc etc...
The repr line is incorrect (should use the self.timeStamp). As well it's a poor implementation that needlessly does math for every get(). The expiry time should be calculated in the CachedItem init.
In fact, if you're only implementing the get method, this shouldn't be a dict subclass, it should be an object with an embedded dict.
6

Try redis, it is one of the cleanest and easiest solutions for applications to share data in a atomic way or if you have got some web server platform. Its very easy to setup, you will need a python redis client http://pypi.python.org/pypi/redis

1 Comment

Should be mentioned, It is out of process, needs to be accessed using TCP.
6

You can use my simple solution to the problem. It is really straightforward, nothing fancy:

class MemCache(dict): def __init__(self, fn): dict.__init__(self) self.__fn = fn def __getitem__(self, item): if item not in self: dict.__setitem__(self, item, self.__fn(item)) return dict.__getitem__(self, item) mc = MemCache(lambda x: x*x) for x in xrange(10): print mc[x] for x in xrange(10): print mc[x] 

It indeed lacks expiration funcionality, but you can easily extend it with specifying a particular rule in MemCache c-tor.

Hope code is enough self-explanatory, but if not, just to mention, that cache is being passed a translation function as one of its c-tor params. It's used in turn to generate cached output regarding the input.

Hope it helps

2 Comments

+1 for suggesting something simple. Depending on the problem, it might just be the tool for the job. P.S. You don't need the else in __getitem__ :)
Why would he not need to else in the __getitem__ ? That's where he populates the dict...
4

This project aims to provide "Caching for humans" (seems like it's fairly unknown though)

Some info from the project page:

Installation

pip install cache

Usage:

import pylibmc from cache import Cache backend = pylibmc.Client(["127.0.0.1"]) cache = Cache(backend) @cache("mykey") def some_expensive_method(): sleep(10) return 42 # writes 42 to the cache some_expensive_method() # reads 42 from the cache some_expensive_method() # re-calculates and writes 42 to the cache some_expensive_method.refresh() # get the cached value or throw an error # (unless default= was passed to @cache(...)) some_expensive_method.cached() 

1 Comment

This requires an external memcached server and was mentioned 11 years before this answer
2

Look at gocept.cache on pypi, manage timeout.

Comments

0

Look at bda.cache http://pypi.python.org/pypi/bda.cache - uses ZCA and is tested with zope and bfg.

Comments

0

ExpiringDict is another option:

https://pypi.org/project/expiringdict/

Comments

0

Besides all tools mentioned by others users earlier, you also can use cacheout pypi library.

It allows setting of cache timeout (TTL) for all keys or particular key and get value of particular key if needed.

Hope this helps!

Comments

-8

keyring is the best python caching library. You can use

keyring.set_password("service","jsonkey",json_res) json_res= keyring.get_password("service","jsonkey") json_res= keyring.core.delete_password("service","jsonkey") 

2 Comments

That's a keyring library, not a caching library.
@StavrosKorokithakis Actually, i implemented caching of keys through keyring

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.