9

I'm creating some objects from files (validators from templates xsd files, to draw together other xsd files, as it happens), and I'd like to recreate the objects when the file on disk changes.

I could create something like:

def getobj(fname, cache = {}): try: obj, lastloaded = cache[fname] if lastloaded < last_time_written(fname): # same stuff as in except clause except KeyError: obj = create_from_file(fname) cache[fname] = (obj, currenttime) return obj 

However, I would prefer to use someone else's tested code if it exists. Is there an existing library that does something like this?

Update: I'm using python 2.7.1.

4
  • 1
    Note that instead of repeating the code in the except clause inside your if statement, you could just raise KeyError() instead. Commented Mar 24, 2012 at 19:19
  • 3
    Nice mutable default argument! Commented Mar 24, 2012 at 19:36
  • @Amber Or use an inner function, which would probably be cleaner. Commented Mar 24, 2012 at 19:41
  • Contrary to @Katriel I don't like the mutable default argument here because I don't think they behave intuitively. Most of the time mutable default arguments change when you don't expect them to. In this case it is of course intended, but someone else reading the code may find themselves either (a) not understanding how the function works because it is counter-intuitive that cache will be anything else than {}, or (b), suspect that the function will fail at some point because it uses mutable default arguments. Commented Jun 11, 2018 at 12:36

3 Answers 3

5

Your code (including the cache logic) looks fine.

Consider moving the cache variable outside the function definition. That will make it possible to add other functions to clear or inspect the cache.

If you want to look at code that does something similar, look at the source for the filecmp module: http://hg.python.org/cpython/file/2.7/Lib/filecmp.py The interesting part is how the stat module is used to determine whether a file has changed. Here is the signature function:

def _sig(st): return (stat.S_IFMT(st.st_mode), st.st_size, st.st_mtime) 
Sign up to request clarification or add additional context in comments.

Comments

3

Three thoughts.

  1. Use try... except... else for a neater control flow.

  2. File modification times are notoriously unstable -- in particular, they don't necessarily correspond to the most recent time the file was modified!

  3. Python 3 contains a caching decorator: functools.lru_cache. Here's the source.

    def lru_cache(maxsize=100): """Least-recently-used cache decorator. If *maxsize* is set to None, the LRU features are disabled and the cache can grow without bound. Arguments to the cached function must be hashable. View the cache statistics named tuple (hits, misses, maxsize, currsize) with f.cache_info(). Clear the cache and statistics with f.cache_clear(). Access the underlying function with f.__wrapped__. See: http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used """ # Users should only access the lru_cache through its public API: # cache_info, cache_clear, and f.__wrapped__ # The internals of the lru_cache are encapsulated for thread safety and # to allow the implementation to change (including a possible C version). def decorating_function(user_function, tuple=tuple, sorted=sorted, len=len, KeyError=KeyError): hits = misses = 0 kwd_mark = (object(),) # separates positional and keyword args lock = Lock() # needed because ordereddicts aren't threadsafe if maxsize is None: cache = dict() # simple cache without ordering or size limit @wraps(user_function) def wrapper(*args, **kwds): nonlocal hits, misses key = args if kwds: key += kwd_mark + tuple(sorted(kwds.items())) try: result = cache[key] hits += 1 except KeyError: result = user_function(*args, **kwds) cache[key] = result misses += 1 return result else: cache = OrderedDict() # ordered least recent to most recent cache_popitem = cache.popitem cache_renew = cache.move_to_end @wraps(user_function) def wrapper(*args, **kwds): nonlocal hits, misses key = args if kwds: key += kwd_mark + tuple(sorted(kwds.items())) try: with lock: result = cache[key] cache_renew(key) # record recent use of this key hits += 1 except KeyError: result = user_function(*args, **kwds) with lock: cache[key] = result # record recent use of this key misses += 1 if len(cache) > maxsize: cache_popitem(0) # purge least recently used cache entry return result def cache_info(): """Report cache statistics""" with lock: return _CacheInfo(hits, misses, maxsize, len(cache)) def cache_clear(): """Clear the cache and cache statistics""" nonlocal hits, misses with lock: cache.clear() hits = misses = 0 wrapper.cache_info = cache_info wrapper.cache_clear = cache_clear return wrapper return decorating_function 

1 Comment

I never knew about the else clause. Thanks for that (and all of this).
1

Unless there is a specific reason to use it as argument I would use cache as a global object

4 Comments

Valid, it was more an act of whimsy while composing in the SO window.
well, one reason is for performance. the whole purpose of a cache is to improve performance, and local variable lookups (including default arguments) are faster by a modest amount compared with global lookups. That said, this pattern is a great way to trip up future generations that aren't already familiar with this language quirk, and as you say, a global should be preferred for its explicitness when performance isn't of utmost importance.
@TokenMacGuy the usual idiom is def foo(cache=cache): to copy global variables into the local scope.
@TokenMacGuy I would say that the performance of global variable is pretty good comparing at a file seek

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.