2

I am using Kubernetes in Google Cloud (GKE).

I have an application that is hoarding memory I need to take a process dump as indicated here. Kubernetes is going to kill the pod when it gets to the 512Mb of RAM.

So I connect to the pod

# kubectl exec -it stuff-7d8c5598ff-2kchk /bin/bash 

And run:

# apt-get update && apt-get install procps && apt-get install gdb 

Find the process I want:

root@stuff-7d8c5598ff-2kchk:/app# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 4.6 2.8 5318004 440268 ? SLsl Oct11 532:18 dotnet stuff.Web.dll root 114576 0.0 0.0 18212 3192 ? Ss 17:23 0:00 /bin/bash root 114583 0.0 0.0 36640 2844 ? R+ 17:23 0:00 ps aux 

But when I try to dump...

root@stuff-7d8c5598ff-2kchk:/app# gcore 1 ptrace: Operation not permitted. You can't do that without a process to debug. The program is not being run. gcore: failed to create core.1 

I tried several solutions like these, that always ends in the same result:

root@stuff-7d8c5598ff-2kchk:/app# echo 0 > proc/sys/kernel/yama/ptrace_scope bash: /proc/sys/kernel/yama/ptrace_scope: Read-only file system 

I cannot find the way to connect to the pod and deal with this ptrace thing. I found that docker has a --privileged switch, but I cannot find anything similar for kubectl.

UPDATE I found how to enable PTRACE:

apiVersion: v1 kind: Pod metadata: name: <your-pod> spec: shareProcessNamespace: true containers: - name: containerB image: <your-debugger-image> securityContext: capabilities: add: - SYS_PTRACE 

Get the process dump:

root@stuff-6cd8848797-klrwr:/app# gcore 1 [New LWP 9] [New LWP 10] [New LWP 13] [New LWP 14] [New LWP 15] [New LWP 16] [New LWP 17] [New LWP 18] [New LWP 19] [New LWP 20] [New LWP 22] [New LWP 24] [New LWP 25] [New LWP 27] [New LWP 74] [New LWP 100] [New LWP 753] [New LWP 756] [New LWP 765] [New LWP 772] [New LWP 814] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 185 ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory. warning: target file /proc/1/cmdline contained unexpected null characters Saved corefile core.1 

Funny thing, I cannot find lldb-3.6, so I install the lldb-3.8:

root@stuff-6cd8848797-klrwr:/app# apt-get update && apt-get install lldb-3 .6 Hit:1 http://security.debian.org/debian-security stretch/updates InRelease Ign:2 http://cdn-fastly.deb.debian.org/debian stretch InRelease Hit:3 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease Hit:4 http://cdn-fastly.deb.debian.org/debian stretch Release Reading package lists... Done Reading package lists... Done Building dependency tree Reading state information... Done Note, selecting 'python-lldb-3.6' for regex 'lldb-3.6' 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. 

Find SOS plugin:

root@stuff-6cd8848797-klrwr:/app# find /usr -name libsosplugin.so /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5/libsosplugin.so 

Run lldb...

root@stuff-6cd8848797-klrwr:/app# lldb `which dotnet` -c core.1 (lldb) target create "/usr/bin/dotnet" --core "core.1" 

But it gets tuck forever, the prompt never gets to (lldb) ever again...

2
  • If you have access to the host machine, you can use nsenter to run the command from the host. I don't know how GKE works in that regard, though. Commented Oct 19, 2018 at 21:51
  • @vlad I am still getting the "ptrace: Operation not permitted." error even after using the securityContext. Any guess, why? Commented Oct 11, 2019 at 19:14

1 Answer 1

0

I had similar issue. Try installing a correct version of LLDB. SOS plugin from specific dotnet version is linked to a specific version of LLDB. For example dotnet 2.0.5 is linked with LLDB 3.6, v.2.1.5 is linked with LLDB 3.9. Also this document might be helpful: Debugging CoreCLR

Note not all versions of LLDB are available for some OS. For example LLDB 3.6 is unavailable on Debian but available on Ubuntu.

Sign up to request clarification or add additional context in comments.

2 Comments

Note that I do not get to try loading SOS, it hangs before that, lldb itself. I tried to load SOS in lldb first, but it also hangs both when attaching to a live process and also when loading a dump.
@Vlad . it seams to be a bug in lldb : github.com/nodejs/llnode/issues/61 . Try using lldb 4.0. I was able to load dump with .net 2.1.6+ lldb 4.0 gist.github.com/segor/dd98f3de05b23529af561ec4ed1305f7

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.