387

I've recently become interested in algorithms and have begun exploring them by writing a naive implementation and then optimizing it in various ways.

I'm already familiar with the standard Python module for profiling runtime (for most things I've found the timeit magic function in IPython to be sufficient), but I'm also interested in memory usage so I can explore those tradeoffs as well (e.g. the cost of caching a table of previously computed values versus recomputing them as needed). Is there a module that will profile the memory usage of a given function for me?

2

10 Answers 10

211

Python 3.4 includes a new module: tracemalloc. It provides detailed statistics about which code is allocating the most memory. Here's an example that displays the top three lines allocating memory.

from collections import Counter import linecache import os import tracemalloc def display_top(snapshot, key_type='lineno', limit=3): snapshot = snapshot.filter_traces(( tracemalloc.Filter(False, "<frozen importlib._bootstrap>"), tracemalloc.Filter(False, "<unknown>"), )) top_stats = snapshot.statistics(key_type) print("Top %s lines" % limit) for index, stat in enumerate(top_stats[:limit], 1): frame = stat.traceback[0] # replace "/path/to/module/file.py" with "module/file.py" filename = os.sep.join(frame.filename.split(os.sep)[-2:]) print("#%s: %s:%s: %.1f KiB" % (index, filename, frame.lineno, stat.size / 1024)) line = linecache.getline(frame.filename, frame.lineno).strip() if line: print(' %s' % line) other = top_stats[limit:] if other: size = sum(stat.size for stat in other) print("%s other: %.1f KiB" % (len(other), size / 1024)) total = sum(stat.size for stat in top_stats) print("Total allocated size: %.1f KiB" % (total / 1024)) tracemalloc.start() counts = Counter() fname = '/usr/share/dict/american-english' with open(fname) as words: words = list(words) for word in words: prefix = word[:3] counts[prefix] += 1 print('Top prefixes:', counts.most_common(3)) snapshot = tracemalloc.take_snapshot() display_top(snapshot) 

And here are the results:

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)] Top 3 lines #1: scratches/memory_test.py:37: 6527.1 KiB words = list(words) #2: scratches/memory_test.py:39: 247.7 KiB prefix = word[:3] #3: scratches/memory_test.py:40: 193.0 KiB counts[prefix] += 1 4 other: 4.3 KiB Total allocated size: 6972.1 KiB 

When is a memory leak not a leak?

That example is great when the memory is still being held at the end of the calculation, but sometimes you have code that allocates a lot of memory and then releases it all. It's not technically a memory leak, but it's using more memory than you think it should. How can you track memory usage when it all gets released? If it's your code, you can probably add some debugging code to take snapshots while it's running. If not, you can start a background thread to monitor memory usage while the main thread runs.

Here's the previous example where the code has all been moved into the count_prefixes() function. When that function returns, all the memory is released. I also added some sleep() calls to simulate a long-running calculation.

from collections import Counter import linecache import os import tracemalloc from time import sleep def count_prefixes(): sleep(2) # Start up time. counts = Counter() fname = '/usr/share/dict/american-english' with open(fname) as words: words = list(words) for word in words: prefix = word[:3] counts[prefix] += 1 sleep(0.0001) most_common = counts.most_common(3) sleep(3) # Shut down time. return most_common def main(): tracemalloc.start() most_common = count_prefixes() print('Top prefixes:', most_common) snapshot = tracemalloc.take_snapshot() display_top(snapshot) def display_top(snapshot, key_type='lineno', limit=3): snapshot = snapshot.filter_traces(( tracemalloc.Filter(False, "<frozen importlib._bootstrap>"), tracemalloc.Filter(False, "<unknown>"), )) top_stats = snapshot.statistics(key_type) print("Top %s lines" % limit) for index, stat in enumerate(top_stats[:limit], 1): frame = stat.traceback[0] # replace "/path/to/module/file.py" with "module/file.py" filename = os.sep.join(frame.filename.split(os.sep)[-2:]) print("#%s: %s:%s: %.1f KiB" % (index, filename, frame.lineno, stat.size / 1024)) line = linecache.getline(frame.filename, frame.lineno).strip() if line: print(' %s' % line) other = top_stats[limit:] if other: size = sum(stat.size for stat in other) print("%s other: %.1f KiB" % (len(other), size / 1024)) total = sum(stat.size for stat in top_stats) print("Total allocated size: %.1f KiB" % (total / 1024)) main() 

When I run that version, the memory usage has gone from 6MB down to 4KB, because the function released all its memory when it finished.

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)] Top 3 lines #1: collections/__init__.py:537: 0.7 KiB self.update(*args, **kwds) #2: collections/__init__.py:555: 0.6 KiB return _heapq.nlargest(n, self.items(), key=_itemgetter(1)) #3: python3.6/heapq.py:569: 0.5 KiB result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)] 10 other: 2.2 KiB Total allocated size: 4.0 KiB 

Now here's a version inspired by another answer that starts a second thread to monitor memory usage.

from collections import Counter import linecache import os import tracemalloc from datetime import datetime from queue import Queue, Empty from resource import getrusage, RUSAGE_SELF from threading import Thread from time import sleep def memory_monitor(command_queue: Queue, poll_interval=1): tracemalloc.start() old_max = 0 snapshot = None while True: try: command_queue.get(timeout=poll_interval) if snapshot is not None: print(datetime.now()) display_top(snapshot) return except Empty: max_rss = getrusage(RUSAGE_SELF).ru_maxrss if max_rss > old_max: old_max = max_rss snapshot = tracemalloc.take_snapshot() print(datetime.now(), 'max RSS', max_rss) def count_prefixes(): sleep(2) # Start up time. counts = Counter() fname = '/usr/share/dict/american-english' with open(fname) as words: words = list(words) for word in words: prefix = word[:3] counts[prefix] += 1 sleep(0.0001) most_common = counts.most_common(3) sleep(3) # Shut down time. return most_common def main(): queue = Queue() poll_interval = 0.1 monitor_thread = Thread(target=memory_monitor, args=(queue, poll_interval)) monitor_thread.start() try: most_common = count_prefixes() print('Top prefixes:', most_common) finally: queue.put('stop') monitor_thread.join() def display_top(snapshot, key_type='lineno', limit=3): snapshot = snapshot.filter_traces(( tracemalloc.Filter(False, "<frozen importlib._bootstrap>"), tracemalloc.Filter(False, "<unknown>"), )) top_stats = snapshot.statistics(key_type) print("Top %s lines" % limit) for index, stat in enumerate(top_stats[:limit], 1): frame = stat.traceback[0] # replace "/path/to/module/file.py" with "module/file.py" filename = os.sep.join(frame.filename.split(os.sep)[-2:]) print("#%s: %s:%s: %.1f KiB" % (index, filename, frame.lineno, stat.size / 1024)) line = linecache.getline(frame.filename, frame.lineno).strip() if line: print(' %s' % line) other = top_stats[limit:] if other: size = sum(stat.size for stat in other) print("%s other: %.1f KiB" % (len(other), size / 1024)) total = sum(stat.size for stat in top_stats) print("Total allocated size: %.1f KiB" % (total / 1024)) main() 

The resource module lets you check the current memory usage, and save the snapshot from the peak memory usage. The queue lets the main thread tell the memory monitor thread when to print its report and shut down. When it runs, it shows the memory being used by the list() call:

2018-05-29 10:34:34.441334 max RSS 10188 2018-05-29 10:34:36.475707 max RSS 23588 2018-05-29 10:34:36.616524 max RSS 38104 2018-05-29 10:34:36.772978 max RSS 45924 2018-05-29 10:34:36.929688 max RSS 46824 2018-05-29 10:34:37.087554 max RSS 46852 Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)] 2018-05-29 10:34:56.281262 Top 3 lines #1: scratches/scratch.py:36: 6527.0 KiB words = list(words) #2: scratches/scratch.py:38: 16.4 KiB prefix = word[:3] #3: scratches/scratch.py:39: 10.1 KiB counts[prefix] += 1 19 other: 10.8 KiB Total allocated size: 6564.3 KiB 

If you're on Linux, you may find /proc/self/statm more useful than the resource module.

Sign up to request clarification or add additional context in comments.

8 Comments

This is great, but it seems to only print the snapshots during intervals when functions inside "count_prefixes()" return. In other words, if you have some long running call, e.g. long_running() inside the count_prefixes() function, the max RSS values will not be printed until long_running() returns. Or am I mistaken?
I think you're mistaken, @robguinness. memory_monitor() is running on a separate thread from count_prefixes(), so the only ways that one can affect the other are the GIL and the message queue that I pass to memory_monitor(). I suspect that when count_prefixes() calls sleep(), it encourages the thread context to switch. If your long_running() isn't actually taking very long, then the thread context might not switch until you hit the sleep() call back in count_prefixes(). If that doesn't make sense, post a new question and link to it from here.
Thanks. I will post a new question and add a link here. (I need to work up an example of the issue I am having, since I can't share the proprietary parts of the code.)
tracemalloc is really awesome, but unfortunately it only accounts for memory allocated by python, so if you have some c/c++ extension that does it own allocations, tracemalloc won't report it.
@stason I assume they have to, but I don't know the details. From the link I gave, it sounds like they have to do something specific when allocating memory in C for it to be counted.
|
158

This one has been answered already here: Python memory profiler

Basically you do something like that (cited from Guppy-PE):

>>> from guppy import hpy; h=hpy() >>> h.heap() Partition of a set of 48477 objects. Total size = 3265516 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 25773 53 1612820 49 1612820 49 str 1 11699 24 483960 15 2096780 64 tuple 2 174 0 241584 7 2338364 72 dict of module 3 3478 7 222592 7 2560956 78 types.CodeType 4 3296 7 184576 6 2745532 84 function 5 401 1 175112 5 2920644 89 dict of class 6 108 0 81888 3 3002532 92 dict (no owner) 7 114 0 79632 2 3082164 94 dict of type 8 117 0 51336 2 3133500 96 type 9 667 1 24012 1 3157512 97 __builtin__.wrapper_descriptor <76 more rows. Type e.g. '_.more' to view.> >>> h.iso(1,[],{}) Partition of a set of 3 objects. Total size = 176 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 33 136 77 136 77 dict (no owner) 1 1 33 28 16 164 93 list 2 1 33 12 7 176 100 int >>> x=[] >>> h.iso(x).sp 0: h.Root.i0_modules['__main__'].__dict__['x'] >>> 

6 Comments

Official guppy documentation is a bit minimial; for other resources see this example and the heapy essay.
@robguinness By downgraded you mean down-voted? That doesn't seem fair because it was valuable at one point in time. I think an edit at the top stating it is no longer valid for X reason and to see answer Y or Z instead. I think this course of action is more appropriate.
Sure, that works, too, but somehow it would be nice if the accepted and highest voted answer involved a solution that still works and is maintained.
Only available for Python 2
You can try a version for python3: github.com/zhuyifei1999/guppy3
|
48

Disclosure:

  • Applicable on Linux only
  • Reports memory used by the current process as a whole, not individual functions within

But nice because of its simplicity:

import resource def using(point=""): usage=resource.getrusage(resource.RUSAGE_SELF) return '''%s: usertime=%s systime=%s mem=%s mb '''%(point,usage[0],usage[1], usage[2]/1024.0 ) 

Just insert using("Label") where you want to see what's going on. For example

print(using("before")) wrk = ["wasting mem"] * 1000000 print(using("after")) >>> before: usertime=2.117053 systime=1.703466 mem=53.97265625 mb >>> after: usertime=2.12023 systime=1.70708 mem=60.8828125 mb 

5 Comments

"memory usage of a given function" so your approach is not helping.
By looking at usage[2] you are looking at ru_maxrss, which is only the portion of the process which is resident. This won't help much if the process has been swapped to disk, even partially.
resource is a Unix specific module that does not work under Windows.
The units of ru_maxrss (that is, usage[2]) are kB, not pages so there is no need to multiply that number by resource.getpagesize().
This printed out nothing for me.
45

If you only want to look at the memory usage of an object, (answer to other question)

There is a module called Pympler which contains the asizeof module.

Use as follows:

from pympler import asizeof asizeof.asizeof(my_object) 

Unlike sys.getsizeof, it works for your self-created objects.

>>> asizeof.asizeof(tuple('bcd')) 200 >>> asizeof.asizeof({'foo': 'bar', 'baz': 'bar'}) 400 >>> asizeof.asizeof({}) 280 >>> asizeof.asizeof({'foo':'bar'}) 360 >>> asizeof.asizeof('foo') 40 >>> asizeof.asizeof(Bar()) 352 >>> asizeof.asizeof(Bar().__dict__) 280 
>>> help(asizeof.asizeof) Help on function asizeof in module pympler.asizeof: asizeof(*objs, **opts) Return the combined size in bytes of all objects passed as positional arguments. 

11 Comments

Is this asizeof related to RSS?
@mousecoder: Which RSS at en.wikipedia.org/wiki/RSS_(disambiguation)? Web feeds? How?
@serv-inc Resident set size, though I can only find one mention of it in Pympler's source and that mention doesn't seem directly tied to asizeof
@mousecoder the memory reported by asizeof can contribute to RSS, yes. I'm not sure what else you mean by "related to".
@serv-inc its possible it may be very case specific. but for my usecase measuring one large multidimensional dictionary, i found tracemalloc solution below a magnitude faster
|
21

Below is a simple function decorator which allows to track how much memory the process consumed before the function call, after the function call, and what is the difference:

import time import os import psutil def elapsed_since(start): return time.strftime("%H:%M:%S", time.gmtime(time.time() - start)) def get_process_memory(): process = psutil.Process(os.getpid()) mem_info = process.memory_info() return mem_info.rss def profile(func): def wrapper(*args, **kwargs): mem_before = get_process_memory() start = time.time() result = func(*args, **kwargs) elapsed_time = elapsed_since(start) mem_after = get_process_memory() print("{}: memory before: {:,}, after: {:,}, consumed: {:,}; exec time: {}".format( func.__name__, mem_before, mem_after, mem_after - mem_before, elapsed_time)) return result return wrapper 

Here is my blog which describes all the details. (archived link)

4 Comments

it should be process.memory_info().rss not process.get_memory_info().rss, at least in ubuntu and python 3.6. related stackoverflow.com/questions/41012058/psutil-error-on-macos
You're right as to 3.x. My customer is using Python 2.7, not the newest version.
is this in bytes, KB , MB , what?
I hope the return value will be in bytes, source: psutil.readthedocs.io/en/latest/….
17

Since the accepted answer and also the next highest voted answer have, in my opinion, some problems, I'd like to offer one more answer that is based closely on Ihor B.'s answer with some small but important modifications.

This solution allows you to run profiling on either by wrapping a function call with the profile function and calling it, or by decorating your function/method with the @profile decorator.

The first technique is useful when you want to profile some third-party code without messing with its source, whereas the second technique is a bit "cleaner" and works better when you are don't mind modifying the source of the function/method you want to profile.

I've also modified the output, so that you get RSS, VMS, and shared memory. I don't care much about the "before" and "after" values, but only the delta, so I removed those (if you're comparing to Ihor B.'s answer).

Profiling code

# profile.py import time import os import psutil import inspect def elapsed_since(start): #return time.strftime("%H:%M:%S", time.gmtime(time.time() - start)) elapsed = time.time() - start if elapsed < 1: return str(round(elapsed*1000,2)) + "ms" if elapsed < 60: return str(round(elapsed, 2)) + "s" if elapsed < 3600: return str(round(elapsed/60, 2)) + "min" else: return str(round(elapsed / 3600, 2)) + "hrs" def get_process_memory(): process = psutil.Process(os.getpid()) mi = process.memory_info() return mi.rss, mi.vms, mi.shared def format_bytes(bytes): if abs(bytes) < 1000: return str(bytes)+"B" elif abs(bytes) < 1e6: return str(round(bytes/1e3,2)) + "kB" elif abs(bytes) < 1e9: return str(round(bytes / 1e6, 2)) + "MB" else: return str(round(bytes / 1e9, 2)) + "GB" def profile(func, *args, **kwargs): def wrapper(*args, **kwargs): rss_before, vms_before, shared_before = get_process_memory() start = time.time() result = func(*args, **kwargs) elapsed_time = elapsed_since(start) rss_after, vms_after, shared_after = get_process_memory() print("Profiling: {:>20} RSS: {:>8} | VMS: {:>8} | SHR {" ":>8} | time: {:>8}" .format("<" + func.__name__ + ">", format_bytes(rss_after - rss_before), format_bytes(vms_after - vms_before), format_bytes(shared_after - shared_before), elapsed_time)) return result if inspect.isfunction(func): return wrapper elif inspect.ismethod(func): return wrapper(*args,**kwargs) 

Example usage, assuming the above code is saved as profile.py:

from profile import profile from time import sleep from sklearn import datasets # Just an example of 3rd party function call # Method 1 run_profiling = profile(datasets.load_digits) data = run_profiling() # Method 2 @profile def my_function(): # do some stuff a_list = [] for i in range(1,100000): a_list.append(i) return a_list res = my_function() 

This should result in output similar to the below:

Profiling: <load_digits> RSS: 5.07MB | VMS: 4.91MB | SHR 73.73kB | time: 89.99ms Profiling: <my_function> RSS: 1.06MB | VMS: 1.35MB | SHR 0B | time: 8.43ms 

A couple of important final notes:

  1. Keep in mind, this method of profiling is only going to be approximate, since lots of other stuff might be happening on the machine. Due to garbage collection and other factors, the deltas might even be zero.
  2. For some unknown reason, very short function calls (e.g. 1 or 2 ms) show up with zero memory usage. I suspect this is some limitation of the hardware/OS (tested on basic laptop with Linux) on how often memory statistics are updated.
  3. To keep the examples simple, I didn't use any function arguments, but they should work as one would expect, i.e. profile(my_function, arg) to profile my_function(arg)

1 Comment

AttributeError: 'pmem' object has no attribute 'shared', python3.9
8

A simple example to calculate the memory usage of a block of codes / function using memory_profile, while returning result of the function:

import memory_profiler as mp def fun(n): tmp = [] for i in range(n): tmp.extend(list(range(i*i))) return "XXXXX" 

calculate memory usage before running the code then calculate max usage during the code:

start_mem = mp.memory_usage(max_usage=True) res = mp.memory_usage(proc=(fun, [100]), max_usage=True, retval=True) print('start mem', start_mem) print('max mem', res[0][0]) print('used mem', res[0][0]-start_mem) print('fun output', res[1]) 

calculate usage in sampling points while running function:

res = mp.memory_usage((fun, [100]), interval=.001, retval=True) print('min mem', min(res[0])) print('max mem', max(res[0])) print('used mem', max(res[0])-min(res[0])) print('fun output', res[1]) 

Credits: @skeept

Comments

5

Different use cases require different tools.

Web applications suffer from memory leaks, and so you want tools that are good at catching that sort of thing. memory-profiler is a fine tool here, you can see that a particular line of code is responsible for increased memory usage.

For data processing, you want peak memory because the issue isn't leaks, the issue is just allocating lots of memory. Imagine if you have a single line of code that allocates a temporary array of 10GB and then immediately drops it; I've made mistakes like this. memory-profiler will never catch this, because the memory usage at start and end of line is the same. So you need a very different kind of profiler.

For the latter use case, relevant tools include Memray and Fil, both open source, and Sciagraph (commercial, but has free plan and also does CPU profiling).

Comments

3

maybe it help:
<see additional>

pip install gprof2dot sudo apt-get install graphviz gprof2dot -f pstats profile_for_func1_001 | dot -Tpng -o profile.png def profileit(name): """ @profileit("profile_for_func1_001") """ def inner(func): def wrapper(*args, **kwargs): prof = cProfile.Profile() retval = prof.runcall(func, *args, **kwargs) # Note use of name from outer scope prof.dump_stats(name) return retval return wrapper return inner @profileit("profile_for_func1_001") def func1(...) 

1 Comment

This is not memory profiling.
2

Check out and try Scalene. It is a CPU, GPU, and memory profiler. The source code is being actively maintained and the repo has more than 10K stars on GitHub.
The comparison between different profilers taken from their repo: Profilers comparison table

Check more details and options in the repository description.

P.S. I am not affiliated with the library team nor with its distribution.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.