2

I have a server with a ton of zombie processes. Almost a thousand. If possible, I would like to reap these processes because it doesn't seem like the parent (one parent is causing all 1000 zombies) is going to call the wait function. I see bash has a builtin wait function, but when I use it to try to reap one of the zombies, I get the following error.

# wait 17517 bash: wait: pid 17517 is not a child of this shell 

I am root, but that does not seem to make a difference. I have a couple questions

  1. Can I reap a zombie process if it is not the child of my shell?
  2. If not, is there anything I can do? I am not certain I should kill the parent
  3. Should I be worried? It seems the parent has a resource leak and is not garbage collecting or whatever.
7
  • Any solution I can think of would involve stopping the parent and forcing it to execute some system calls (wait and signal(SIGCHLD,SIG_IGN)). Do you know anything about what the parent is doing when it's not creating children? Is it just listening on a socket? Commented Aug 28, 2018 at 18:16
  • @Mark Plotnick Unfortunately, I can't seem to find much about the parent. It is what you would call, a custom application. I would not be able to get more information on my own. Commented Aug 28, 2018 at 18:19
  • I'd be more worried about why the zombie processes exist than trying to reap them outside of the parent process. Are they zombies themselves causing any harm? Commented Aug 28, 2018 at 18:24
  • @ChristianGibbons The system seems pretty slow to respond. Probably related. I am just worried that left to themselves, the system will crash. There are no signs of stopping. Commented Aug 28, 2018 at 18:30
  • 2
    Zombie processes only occupy space in the process table, they wouldn’t explain unresponsiveness. Commented Aug 28, 2018 at 18:36

1 Answer 1

3
  1. Can I reap a zombie process if it is not the child of my shell?

No, you can't.

  1. If not, is there anything I can do? I am not certain I should kill the parent

You can try to stop the parent, then restart it with exec from a shell that ignores SIGCHLD. A parent that ignores SIGCHLD won't leave zombies.

  1. Should I be worried? It seems the parent has a resource leak and is not garbage collecting or whatever.

If the number of zombies increases, eventually you will reach the point where you can't fork new processes.

1
  • with a creative use of gdb, one could make the parent actually call signal(SIGCHLD, SIG_IGN) and a few waitpid(-1,NULL,WNOHANG) to plug the leak without restarting it. Of course gdb and production might not go well together. Commented Aug 29, 2018 at 22:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.