Skip to content

ossobv/pycoredump

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python wrapper around GDB to examine core dumps

Deadlocks in C programs are a pain to debug. But if you know your way around GDB, you may get more out of a core dump than you think.

Deadlock example

For the following example, we'll create a core dump of a program that intentionally deadlocks: examples/deadlock

That example application spawns eight threads of which five threads use locks and two of those lock each other -- because of a locking inversion.

The application has an alarm set, so it will generate SIGABRT core file after a few seconds. That's nice because now we have something to debug.

Go into the examples/ dir and type make deadlock.core. It will self-test the deadlock application and then create the file deadlock.core using gdb.

$ cd examples/ $ make deadlock.core cc -Wall -g -O0 -c -o deadlock.o deadlock.c cc -Wall -g -O0 -pthread deadlock.o -o deadlock ./deadlock nolock ...

If all went well, you now have a deadlock.core file:

$ ls -l deadlock.core -rw-rw-r-- 1 walter walter 17448904 sep 1 21:04 deadlock.core

Manual GDB debugging

If you open the core file with GDB, you should manually be able to find out which threads lock each other; but it's tedious work.

For example:

$ gdb ./deadlock deadlock.core (gdb) info threads  Id Target Id Frame  9 Thread 0x2aaaaaaef040 (LWP 27444) 0x00002aaaaaf24cc9 in __GI_raise  (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56  8 Thread 0x2aaaab4b3700 (LWP 27448) __lll_lock_wait ()  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135  7 Thread 0x2aaaab6b4700 (LWP 27449) __lll_lock_wait ()  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135  6 Thread 0x2aaaab8b5700 (LWP 27450) 0x00002aaaaafaef3d in nanosleep  () at ../sysdeps/unix/syscall-template.S:81  5 Thread 0x2aaaabab6700 (LWP 27451) __lll_lock_wait ()  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135  4 Thread 0x2aaaabcb7700 (LWP 27452) __lll_lock_wait ()  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135  3 Thread 0x2aaaabeb8700 (LWP 27453) 0x00002aaaaafaef3d in nanosleep  () at ../sysdeps/unix/syscall-template.S:81  2 Thread 0x2aaaac0b9700 (LWP 27454) 0x00002aaaaafaef3d in nanosleep  () at ../sysdeps/unix/syscall-template.S:81 * 1 Thread 0x2aaaac2ba700 (LWP 27455) __lll_lock_wait ()  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

Okay, 9 threads. And now?

  1. Check only the threads that are waiting for a lock: those in __lll_lock_wait.
  2. For those threads, check which locks they're waiting for, and who is holding those locks.

Somewhat like this:

(gdb) thread 8 [Switching to thread 8 (Thread 0x2aaaab4b3700 (LWP 27448))] #0 __lll_lock_wait ()  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135  135 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S (gdb) frame 2 #2 0x00002aaaaacda480 in __GI___pthread_mutex_lock (  mutex=0x602120 <speciallock>)  at ../nptl/pthread_mutex_lock.c:79  79 ../nptl/pthread_mutex_lock.c: No such file or directory. (gdb) print *(pthread_mutex_t*)mutex $1 = {__data = {__lock = 2, __count = 0, __owner = 27452,  __nusers = 1, __kind = 0, __spins = 0, __elision = 0,  __list = {__prev = 0x0, __next = 0x0}},  __size = "\002...", '\000' <repeats 26 times>, __align = 2}

There, the __owner shows that the mutex that thread 8 is waiting is waiting for, is (light weight) process id (LWP) 27452, which corresponds to thread 4.

Repeat this step for all threads that are waiting for a lock.

Tedious, huh?

Automated using pycoredump

In the same example directory, now type make debug-deadlock. It will call GDB from the pycoredump library instead.

$ make debug-deadlock PYTHONPATH=`pwd`/../ python debug-deadlock.py \  ./deadlock ./deadlock.core -- all waiting threads -- <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>  waits for <GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)> <GdbThread(thno=7, thid=0x2aaaab6b4700, procid=27449, func=__lll_lock_wait)>  waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)> <GdbThread(thno=5, thid=0x2aaaabab6700, procid=27451, func=__lll_lock_wait)>  waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)> <GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)>  waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)> <GdbThread(thno=1, thid=0x2aaaac2ba700, procid=27455, func=__lll_lock_wait)>  waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>

Nice, only the threads that are waiting for a lock; instead of a listing of all the threads. And as a bonus, directly a reference to the threads they're waiting for.

And it gets better, they can be filtered by relevant threads only: thread 8 and thread 4. And because we get the backtrace info directly, we can see that one of those is indeed the culprit, the one with func=inverted in frame 3.

-- relevant threads -- <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)> <GdbBacktrace(  <GdbFrame(no=0, func=__lll_lock_wait, file=../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135>  <GdbFrame(no=1, func=_L_lock_909, file=/lib/x86_64-linux-gnu/libpthread.so.0>  <GdbFrame(no=2, func=__GI___pthread_mutex_lock, file=../nptl/pthread_mutex_lock.c:79>  <GdbFrame(no=3, func=normal, file=deadlock.c:24>  <GdbFrame(no=4, func=start_thread, file=pthread_create.c:312>  <GdbFrame(no=5, func=clone, file=../sysdeps/unix/sysv/linux/x86_64/clone.S:111> )>  waits for <GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)> <GdbThread(thno=4, thid=0x2aaaabcb7700, procid=27452, func=__lll_lock_wait)> <GdbBacktrace(  <GdbFrame(no=0, func=__lll_lock_wait, file=../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135>  <GdbFrame(no=1, func=_L_lock_909, file=/lib/x86_64-linux-gnu/libpthread.so.0>  <GdbFrame(no=2, func=__GI___pthread_mutex_lock, file=../nptl/pthread_mutex_lock.c:79>  <GdbFrame(no=3, func=inverted, file=deadlock.c:36>  <GdbFrame(no=4, func=start_thread, file=pthread_create.c:312>  <GdbFrame(no=5, func=clone, file=../sysdeps/unix/sysv/linux/x86_64/clone.S:111> )>  waits for <GdbThread(thno=8, thid=0x2aaaab4b3700, procid=27448, func=__lll_lock_wait)>

Better, yes?

Here it clearly shows that they're waiting on each other: a classic example of a deadlock.

That's generated by just the 30 odd lines from debug-deadlock.py using the pycoredump library.

Or, for another sample invocation, try this:

sudo env PYTHONPATH=$(pwd) python examples/debug-deadlock.py \ /usr/sbin/asterisk \ /var/spool/asterisk/core.ua4-ams-2018-11-29T16:09:42+0100 

Conclusion

Obviously the pycoredump library could use a lot of extra features, and cleanup, but this simple example demonstrates how it can make your deadlock debugging life easier.

I've used the manual technique to find the cause of a deadlock in the Asterisk PBX software. That software supports a thread-debugging mode, but that's so slow that it's unusable in production systems. Luckily, much of the needed info is there in the regular core dump; you just have to find it.

And now, finding it may have become a little bit simpler.

About

Python wrapper around GDB to examine core dumps.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors