Skip to content

Produces a large number of zombie processes #177

@beardnick

Description

@beardnick

Environment

prove

prove --version TAP::Harness v3.43 and Perl v5.34.0

nginx

nginx -V nginx version: openresty/1.25.3.1 built by gcc 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) built with OpenSSL 3.2.0 23 Nov 2023 TLS SNI support enabled configure arguments: --prefix=/usr/local/openresty/nginx --with-debug --with-cc-opt='-DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC -O2 -DAPISIX_RUNTIME_VER=1.2.0 -DNGX_GRPC_CLI_ENGINE_PATH=/usr/local/openresty/libgrpc_engine.so -DNGX_HTTP_GRPC_CLI_ENGINE_PATH=/usr/local/openresty/libgrpc_engine.so -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl3/include' --add-module=../ngx_devel_kit-0.3.3 --add-module=../echo-nginx-module-0.63 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.33 --add-module=../ngx_lua-0.10.26 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.37 --add-module=../array-var-nginx-module-0.06 --add-module=../memc-nginx-module-0.20 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../ngx_stream_lua-0.0.14 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -Wl,-rpath,/usr/local/openresty/wasmtime-c-api/lib -L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl3/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl3/lib' --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../mod_dubbo-1.0.2 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../ngx_multi_upstream_module-1.2.0 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../apisix-nginx-module-1.16.0 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../apisix-nginx-module-1.16.0/src/stream --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../apisix-nginx-module-1.16.0/src/meta --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../wasm-nginx-module-0.7.0 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../lua-var-nginx-module-v0.5.3 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../grpc-client-nginx-module-v0.5.0 --add-module=/tmp/tmp.8fm9BEJ9Sy/openresty-1.25.3.1/../lua-resty-events-0.2.0 --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --with-http_v3_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --without-pcre2 --with-http_ssl_module

apisix

apisix version 3.9.1

test-nginx: master branch

How to reproduce

Run unit tests with prove. There are many errors like timeout when waiting for the process 78711 to exit.

prove -v -I ./test-nginx/lib -I./ t/plugin/openid-connect.t ok 1 - t/plugin/openid-connect.t TEST 1: Sanity check with minimal valid configuration. - status code ok ok 2 - t/plugin/openid-connect.t TEST 1: Sanity check with minimal valid configuration. - response_body - response is expected (repeated req 0, req 0) ok 3 - t/plugin/openid-connect.t TEST 1: Sanity check with minimal valid configuration. - pattern "[error]" does not match a line in error.log (req 0) t/plugin/openid-connect.t TEST 2: Missing `client_id`. - timeout when waiting for the process 78711 to exit at /workspace/test-nginx/lib/Test/Nginx/Util.pm line 681. t/plugin/openid-connect.t TEST 2: Missing `client_id`. - WARNING: killing the child process 78711 with force... at /workspace/test-nginx/lib/Test/Nginx/Util.pm line 720. ok 4 - t/plugin/openid-connect.t TEST 2: Missing `client_id`. - status code ok ok 5 - t/plugin/openid-connect.t TEST 2: Missing `client_id`. - response_body - response is expected (repeated req 0, req 0) ok 6 - t/plugin/openid-connect.t TEST 2: Missing `client_id`. - pattern "[error]" does not match a line in error.log (req 0) t/plugin/openid-connect.t TEST 3: Wrong type for `client_id`. - timeout when waiting for the process 78899 to exit at /workspace/test-nginx/lib/Test/Nginx/Util.pm line 681. t/plugin/openid-connect.t TEST 3: Wrong type for `client_id`. - WARNING: killing the child process 78899 with force... at /workspace/test-nginx/lib/Test/Nginx/Util.pm line 720.

and there are many defunct nginx processes

ps -ef | grep nginx root 785 1 0 08:40 ? 00:00:00 [nginx] <defunct> root 2248 1 0 08:41 ? 00:00:00 [nginx] <defunct> root 2885 1 0 08:42 ? 00:00:00 [nginx] <defunct> root 4446 1 0 08:43 ? 00:00:00 [nginx] <defunct> root 5007 1 0 08:44 ? 00:00:00 [nginx] <defunct> root 19585 1 0 09:00 ? 00:00:00 [nginx] <defunct> root 19770 1 0 09:00 ? 00:00:00 [nginx] <defunct> root 21483 1 0 09:02 ? 00:00:00 [nginx] <defunct> root 25649 1 0 09:07 ? 00:00:00 [nginx] <defunct> root 27841 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 27842 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 27843 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 27989 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 27990 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 27991 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 28104 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 28105 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 28106 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 28243 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 28244 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 28245 1 0 09:09 ? 00:00:00 [nginx] <defunct> root 29939 1 0 09:11 ? 00:00:00 [nginx] <defunct>

The possible reason

The prove will kill the nginx process after completing one unit test. However, nginx may exit too quickly, and the prove hasn't waited for the child process to finish. As a result, the nginx process becomes a zombie process, but is_running still considers it a valid process. The prove will continue attempting to kill the nginx process repeatedly until the timeout.

if (defined $pid) {
if ($ENV{TEST_NGINX_FAST_SHUTDOWN}) {
if ($Verbose) {
warn "sending TERM signal to $pid";
}
kill(SIGTERM, $pid);
} else {
if ($Verbose) {
warn "sending QUIT signal to $pid";
}
kill(SIGQUIT, $pid);
}
}
if ($Verbose) {
warn "waitpid timeout: ", timeout();
}
my $timeout_val = timeout();
while ($timeout_val > 0 && is_running($pid)) {
waitpid($pid, WNOHANG);
sleep 0.05;
$timeout_val -= 0.05;
}

My workaround

I've modified the is_running function to recognize zombie processes, allowing the unit tests to run faster without generating as many error messages. However, it will still produce a large number of zombie processes.

The original function

sub is_running ($) {
my $pid = shift;
return kill 0, $pid;
}

My workaround

sub is_running ($) { my $pid = shift; return (kill(0, $pid)) && (not is_defunct($pid)); } sub is_defunct ($) { my $pid = shift; my $output = `ps -o stat= -p $pid`; chomp($output); return $output =~ /Z/; }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions