Skip to content

Excessive memory leak due to uncontrolled accumulation of health.log entries in Podman 5.x #25473

@gaurangomar

Description

@gaurangomar

Issue Description

When using healthchecks in Podman 5.x, we’ve observed that the internal health log grows continuously (into the thousands of entries) and never prunes older records, In our tests, the health.log field in the container’s inspect output eventually contains over 12,000 records, which keeps growing by time. This contrasts with Podman 4.x, which typically keeps only ~5 log entries. Furthermore, running top on the host shows unusually high memory usage by the /usr/bin/podman healthcheck process over time. These symptoms suggest a memory leak tied to Podman’s healthcheck mechanism in version 5.x.

Image

Steps to reproduce the issue

Steps to Reproduce:

  • Healthcheck Configuration:
    Use a healthcheck configuration identical to the one that worked in Podman 4.x. For example:
"Healthcheck": { "Test": [ "CMD", "curl", "-f", "http://agent:8080/health" ], "Interval": 30000000000, "Timeout": 10000000000, "Retries": 5 } 
  • Run the Container:
    Start a container with this configuration on Podman 5.x.

  • Monitor Health Log:
    After the container runs for a while, run podman inspect and check the State.Health.Log field. In Podman 5.x, it continuously accumulates records (e.g., over 12,000 entries) rather than being capped (as observed in Podman 4.x, which only shows about 5 entries).

  • Observe Memory Usage:
    Use monitoring tools (e.g., top) to observe the memory usage. There is a significant and continuous increase in memory consumption, particularly in kernel memory (kmalloc-2k and kmalloc-4k slabs).

This is high usage in top command for healthcheck is randomly visible and we are running 8 containers.

Describe the results you received

When using healthchecks in Podman 5.x, we’ve observed that the internal health log continuously grows instead of being capped at a few entries (as seen in Podman 4.x). In our tests, the health.log field in the container’s inspect output eventually contains over 12,000 records compared to the expected ~5 entries in version 4.x. This uncontrolled log growth correlates with a continuous increase in memory usage.

Describe the results you expected

the mem usages should not increace, and it should have limited logs

podman info output

host: arch: amd64 buildahVersion: 1.37.5 cgroupControllers: - memory - pids cgroupManager: systemd cgroupVersion: v2 conmon: package: conmon-2.1.12-1.el9.x86_64 path: /usr/bin/conmon version: 'conmon version 2.1.12, commit: b3f4044f63d830049366c05304a1d5d558571e85' cpuUtilization: idlePercent: 76.81 systemPercent: 6.73 userPercent: 16.46 cpus: 2 databaseBackend: sqlite distribution: distribution: ol variant: server version: "9.5" eventLogger: file freeLocks: 2026 hostname: k-jambunatha-tf64-ecp-edge-multi-int-openstack-perf-1771036--ed idMappings: gidmap: - container_id: 0 host_id: 2001 size: 1 - container_id: 1 host_id: 100000 size: 65536 uidmap: - container_id: 0 host_id: 2002 size: 1 - container_id: 1 host_id: 100000 size: 65536 kernel: 5.15.0-304.171.4.1.el9uek.x86_64 linkmode: dynamic logDriver: k8s-file memFree: 809750528 memTotal: 3803951104 networkBackend: netavark networkBackendInfo: backend: netavark dns: package: aardvark-dns-1.12.2-1.el9_5.x86_64 path: /usr/libexec/podman/aardvark-dns version: aardvark-dns 1.12.2 package: netavark-1.12.2-1.el9.x86_64 path: /usr/libexec/podman/netavark version: netavark 1.12.2 ociRuntime: name: crun package: crun-1.16.1-1.el9.x86_64 path: /usr/bin/crun version: |-  crun version 1.16.1  commit: afa829ca0122bd5e1d67f1f38e6cc348027e3c32  rundir: /run/user/2002/crun  spec: 1.0.0  +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL  os: linux pasta: executable: /usr/bin/pasta package: passt-0^20240806.gee36266-2.el9.x86_64 version: |  pasta 0^20240806.gee36266-2.el9.x86_64  Copyright Red Hat  GNU General Public License, version 2 or later  <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>  This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.  remoteSocket: exists: true path: /run/user/2002/podman/podman.sock rootlessNetworkCmd: pasta security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: true seccompEnabled: true seccompProfilePath: /usr/share/containers/seccomp.json selinuxEnabled: true serviceIsRemote: false slirp4netns: executable: /usr/bin/slirp4netns package: slirp4netns-1.3.1-1.el9.x86_64 version: |-  slirp4netns version 1.3.1  commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236  libslirp: 4.4.0  SLIRP_CONFIG_VERSION_MAX: 3  libseccomp: 2.5.2  swapFree: 2469085184 swapTotal: 4194299904 uptime: 312h 40m 36.00s (Approximately 13.00 days) variant: "" plugins: authorization: null log: - k8s-file - none - passthrough - journald network: - bridge - macvlan - ipvlan volume: - local registries: search: - container-registry.oracle.com store: configFile: /home/user/.config/containers/storage.conf containerStore: number: 10 paused: 0 running: 10 stopped: 0 graphDriverName: overlay graphOptions: {} graphRoot: /home/user/.local/share/containers/storage graphRootAllocated: 40961572864 graphRootUsed: 2026479616 graphStatus: Backing Filesystem: xfs Native Overlay Diff: "true" Supports d_type: "true" Supports shifting: "false" Supports volatile: "true" Using metacopy: "false" imageCopyTmpDir: /var/tmp imageStore: number: 10 runRoot: /run/user/2002/containers transientStore: false volumePath: /home/user/.local/share/containers/storage/volumes version: APIVersion: 5.2.2 Built: 1735903242 BuiltTime: Fri Jan 3 06:20:42 2025 GitCommit: "" GoVersion: go1.22.9 (Red Hat 1.22.9-2.el9_5) Os: linux OsArch: linux/amd64 Version: 5.2.2

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

No

Additional environment details

podman --version

podman version 5.2.2

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

Metadata

Metadata

Assignees

Labels

jirakind/bugCategorizes issue or PR as related to a bug.locked - please file new issue/PRAssist humans wanting to comment on an old issue or PR with locked comments.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions