3

I am experiencing a weird issue lately:

Sometimes (I cannot reproduce it on purpose), my system is using all its swap, despite there being more than enough free RAM. If this happens, the systems then becomes unresponsive for a couple of minutes, then the OOM killer kills either a "random" process which does not help much, or the X server. If it kills a "random" process, the system does not become responsive (there is still no swap but much free RAM); if it kills X, the swap is freed and the system becomes responsive again.

Output of free when it happens:

$ free -htl total used free shared buff/cache available Mem: 7.6G 1.4G 60M 5.7G 6.1G 257M Low: 7.6G 7.5G 60M High: 0B 0B 0B Swap: 3.9G 3.9G 0B Total: 11G 5.4G 60M 

uname -a:

Linux fedora 4.4.7-300.fc23.x86_64 #1 SMP Wed Apr 13 02:52:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 

Swapiness:

cat /proc/sys/vm/swappiness 5 

Relevant section in dmesg: http://pastebin.com/0P0TLfsC

tmpfs:

$ df -h -t tmpfs Filesystem Size Used Avail Use% Mounted on tmpfs 3.8G 1.5M 3.8G 1% /dev/shm tmpfs 3.8G 1.7M 3.8G 1% /run tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup tmpfs 3.8G 452K 3.8G 1% /tmp tmpfs 776M 16K 776M 1% /run/user/42 tmpfs 776M 32K 776M 1% /run/user/1000 

Meminfo: http://pastebin.com/CRmitCiJ

top -o SHR -n 1 Tasks: 231 total, 1 running, 230 sleeping, 0 stopped, 0 zombie %Cpu(s): 8.5 us, 3.0 sy, 0.3 ni, 86.9 id, 1.3 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 7943020 total, 485368 free, 971096 used, 6486556 buff/cache KiB Swap: 4095996 total, 1698992 free, 2397004 used. 989768 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2066 mkamlei+ 20 0 8342764 163908 145208 S 0.0 2.1 0:59.62 Xorg 2306 mkamlei+ 20 0 1892816 138536 27168 S 0.0 1.7 1:25.47 gnome-shell 3118 mkamlei+ 20 0 596392 21084 13152 S 0.0 0.3 0:04.86 gnome-terminal- 1646 gdm 20 0 1502632 60324 12976 S 0.0 0.8 0:01.91 gnome-shell 2269 mkamlei+ 20 0 1322592 22440 8124 S 0.0 0.3 0:00.87 gnome-settings- 486 root 20 0 47048 8352 7656 S 0.0 0.1 0:00.80 systemd-journal 2277 mkamlei+ 9 -11 570512 10080 6644 S 0.0 0.1 0:15.33 pulseaudio 2581 mkamlei+ 20 0 525424 19272 5796 S 0.0 0.2 0:00.37 redshift-gtk 1036 root 20 0 619016 9204 5408 S 0.0 0.1 0:01.70 NetworkManager 1599 gdm 20 0 1035672 11820 5120 S 0.0 0.1 0:00.28 gnome-settings- 2386 mkamlei+ 20 0 850856 24948 4944 S 0.0 0.3 0:05.84 goa-daemon 2597 mkamlei+ 20 0 1138200 13104 4596 S 0.0 0.2 0:00.28 evolution-alarm 2369 mkamlei+ 20 0 1133908 16472 4560 S 0.0 0.2 0:00.49 evolution-sourc 2529 mkamlei+ 20 0 780088 54080 4380 S 0.0 0.7 0:01.14 gnome-software 2821 mkamlei+ 20 0 1357820 44320 4308 S 0.0 0.6 0:00.23 evolution-calen 2588 mkamlei+ 20 0 1671848 55744 4300 S 0.0 0.7 0:00.49 evolution-calen 2525 mkamlei+ 20 0 613512 8928 4188 S 0.0 0.1 0:00.19 abrt-applet 

ipcs:

[mkamleithner@fedora ~]$ ipcs -m -t ------ Shared Memory Attach/Detach/Change Times -------- shmid owner attached detached changed 294912 mkamleithn Apr 30 20:29:16 Not set Apr 30 20:29:16 393217 mkamleithn Apr 30 20:29:19 Apr 30 20:29:19 Apr 30 20:29:17 491522 mkamleithn Apr 30 20:42:21 Apr 30 20:42:21 Apr 30 20:29:18 524291 mkamleithn Apr 30 20:38:10 Apr 30 20:38:10 Apr 30 20:29:18 786436 mkamleithn Apr 30 20:38:12 Not set Apr 30 20:38:12 [mkamleithner@fedora ~]$ ipcs ------ Message Queues -------- key msqid owner perms used-bytes messages ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 294912 mkamleithn 600 524288 2 dest 0x00000000 393217 mkamleithn 600 2576 2 dest 0x00000000 491522 mkamleithn 600 4194304 2 dest 0x00000000 524291 mkamleithn 600 524288 2 dest 0x00000000 786436 mkamleithn 600 4194304 2 dest ------ Semaphore Arrays -------- key semid owner perms nsems 
[mkamleithner@fedora ~]$ ipcs -m -t ------ Shared Memory Attach/Detach/Change Times -------- shmid owner attached detached changed 294912 mkamleithn Apr 30 20:29:16 Not set Apr 30 20:29:16 393217 mkamleithn Apr 30 20:29:19 Apr 30 20:29:19 Apr 30 20:29:17 491522 mkamleithn Apr 30 20:42:21 Apr 30 20:42:21 Apr 30 20:29:18 524291 mkamleithn Apr 30 20:38:10 Apr 30 20:38:10 Apr 30 20:29:18 786436 mkamleithn Apr 30 20:38:12 Not set Apr 30 20:38:12 
[mkamleithner@fedora ~]$ sudo grep 786436 /proc/*/maps /proc/2084/maps:7ff4a56cc000-7ff4a5acc000 rw-s 00000000 00:05 786436 /SYSV00000000 (deleted) /proc/3984/maps:7f4574d00000-7f4575100000 rw-s 00000000 00:05 786436 /SYSV00000000 (deleted) 
[mkamleithner@fedora ~]$ sudo grep 524291 /proc/*/maps /proc/2084/maps:7ff4a4593000-7ff4a4613000 rw-s 00000000 00:05 524291 /SYSV00000000 (deleted) /proc/2321/maps:7fa9b8a67000-7fa9b8ae7000 rw-s 00000000 00:05 524291 /SYSV00000000 (deleted) 
 [mkamleithner@fedora ~]$ sudo grep 491522 /proc/*/maps /proc/2084/maps:7ff4a4ad3000-7ff4a4ed3000 rw-s 00000000 00:05 491522 /SYSV00000000 (deleted) /proc/2816/maps:7f2763ba1000-7f2763fa1000 rw-s 00000000 00:05 491522 /SYSV00000000 (deleted) 
 [mkamleithner@fedora ~]$ sudo grep 393217 /proc/*/maps /proc/2084/maps:7ff4b1a60000-7ff4b1a61000 rw-s 00000000 00:05 393217 /SYSV00000000 (deleted) /proc/2631/maps:7fb89be79000-7fb89be7a000 rw-s 00000000 00:05 393217 /SYSV00000000 (deleted) 
[mkamleithner@fedora ~]$ sudo grep 294912 /proc/*/maps /proc/2084/maps:7ff4a5510000-7ff4a5590000 rw-s 00000000 00:05 294912 /SYSV00000000 (deleted) /proc/2582/maps:7f7902dd3000-7f7902e53000 rw-s 00000000 00:05 294912 /SYSV00000000 (deleted) 

getting the process names:

[mkamleithner@fedora ~]$ ps aux | grep 2084 mkamlei+ 2084 5.1 2.0 8149580 159272 tty2 Sl+ 20:29 1:10 /usr/libexec/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -nolisten tcp -background none -noreset -keeptty -verbose 3 mkamlei+ 5261 0.0 0.0 118476 2208 pts/0 S+ 20:52 0:00 grep --color=auto 2084 [mkamleithner@fedora ~]$ ps aux | grep 3984 mkamlei+ 3984 11.4 3.6 1355100 293240 tty2 Sl+ 20:38 1:38 /usr/lib64/firefox/firefox mkamlei+ 5297 0.0 0.0 118472 2232 pts/0 S+ 20:52 0:00 grep --color=auto 3984 

Should I also post the results for the other shmids? I don't really know how to interpret the output.

How can I fix this?

Edit: Starting the game "Papers, Please" always seems to trigger this problem after some time. It also happens sometimes when this game is not started, though.

Edit2: Seems to be an X issue. On wayland this does not happen. Might be due to custom settings in xorg.conf.

Final Edit: For anyone experiencing the same problem: I was using DRI 2. Switching to DRI 3 also fixes the problem. this is my relevant section in the xorg.conf:

 Section "Device" Identifier "Intel Graphics" Driver "intel" Option "AccelMethod" "sna" # Option "Backlight" "intel_backlight" BusID "PCI:0:2:0" Option "DRI" "3" #here Option "TearFree" "true" EndSection 

The relevant file on my system is in /usr/share/X11/xorg.conf.d/ .

16
  • You obviously have some process(es) consuming a lot of RAM. As it helps to kill X it is probably some X program(s) which also die when X gets killed. You could look with top and see which processes are using a lot of virtual memory. It is possible to sort on memory usage in top. You could avoid those programs consuming a lot of memory or you could add some more swap to your machine. Commented Apr 30, 2016 at 18:15
  • One more thing that might be worth checking... If you have some tmpfs file system mounted contents below that directory could also consume your virtual memory. df -h | grep tmpfs Commented Apr 30, 2016 at 18:21
  • 1
    Try ( as root ) echo 1 > /proc/sys/vm/drop_caches and see if that 6 gb of cache gets freed up. Also add the output of cat /proc/meminfo to your question. Commented Apr 30, 2016 at 18:44
  • 1
    Ok, we got the output of top (no big processes), the output of df (no big tmpfs). So what about sys v ipc? Could we also get the output of ipcs? Maybe you have some big shared memory. Commented Apr 30, 2016 at 20:19
  • 1
    It may be an issue with the Xorg.conf and my intel card. I added options like DRI 3 to avoid other problems, this may be no supported correctly. Commented Apr 30, 2016 at 21:57

2 Answers 2

2

shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo, available on kernels 2.6.32, displayed as zero if not available)>

So the manpage definition of Shared is not as helpful as it could be :(. If the tmpfs use does not reflect this high value of Shared, then the value must represent some process(es) "who did mmap() with MAP_SHARED|MAP_ANONYMOUS" (or System V shared memory).

6G of shared memory on an 8G system is still a lot. Seriously, you don't want that, at least not on a desktop.

It's weird that it seems to contribute to "buff/cache" as well. But I did a quick test with python and that's just how it works.

To show the processes with the most shared memory, use top -o SHR -n 1.

System V shared memory

Finally it's possible you have some horrible legacy software that uses system V shared memory segments. If they get leaked, they won't show up in top :(.

You can list them with ipcs -m -t. Hopefully the most recently created one is still in use. Take the shmid number and e.g.

$ ipcs -m -t ------ Shared Memory Attach/Detach/Change Times -------- shmid owner attached detached changed 3538944 alan Apr 30 20:35:15 Apr 30 20:35:15 Apr 30 16:07:41 3145729 alan Apr 30 20:35:15 Apr 30 20:35:15 Apr 30 15:04:09 4587522 alan Apr 30 20:37:38 Not set Apr 30 20:37:38 # sudo grep 4587522 /proc/*/maps 

-> then the numbers shown in the /proc paths are the pid of the processes that use the SHM. (So you could e.g. grep the output of ps for that pid number).

Apparent contradictions

  1. Xorg has 8G mapped. Even though you don't have separate video card RAM. It only has 150M resident. It's not that the rest is swapped out, because you don't have enough swap space.

  2. The SHM segments shown by ipcs are all attached to two processes. So none of them have leaked, and they should all show up in the SHR column of top (double-counted even). It's ok if the number of pages used is less than the size of the memory segment, that just means there are pages that haven't been used. But free says we have 6GB of allocated shared memory to account for, and we can't find that.

7
  • It seems that X is using a lot of virtual memory - I updated my question with the output of top. If grep the shmid's in /proc/*/maps, I always (for all entries) get something like this: /proc/2066/maps:7f8d446fa000-7f8d44afa000 rw-s 00000000 00:05 950273 /SYSV00000000 (deleted) Does that mean that the shm is already deleted? Commented Apr 30, 2016 at 20:04
  • I think the shm must exist, or it wouldn't be listed. I see the same on my machine. I think the (deleted) is a way of saying that the filename doesn't really exist, I think it would look similar if you mmapped a file and then deleted the file (the mmapped memory would still work). Commented Apr 30, 2016 at 20:16
  • 1
    I only use an intel integrated graphics card without dedicated video ram. I updated my question with the results your suggestions, but I don't really know how to interpret it, I have no idea what I am looking at... Commented Apr 30, 2016 at 20:59
  • 2
    It just happened again, the virtual memory of Xorg grew to 10.1 GB before the oom killed it. Commented Apr 30, 2016 at 21:17
  • 2
    So it seems to be Xorg which eats memory. The output from ipcs only showed a few megabytes of shared memory. Maybe this is a bug in your X.org, if so probably a memory leak. If you are lucky it might help to switch to another graphics driver like vesa in the Device section of your xorg.conf. However, switching graphics driver will probably give worse performance and/or lack of functionality. Maybe others have also seen this problem? Maybe there are some patches/updates available for your distribution that will fix the problem. Commented Apr 30, 2016 at 22:56
2

shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo, available on kernels 2.6.32, displayed as zero if not available)

tmpfs is swappable. You have tmpfs filesystem(s) which are being filled beyond safe limits. For comparison, the system I'm typing this on has 200M shared. 6G is too much on an 8G system running a desktop, with stuff like dropbox and Steam at the same time.

You could use the normal tools to find what files are causing the problem. Though it is theoretically possible that the files go away when your X session dies.

$ df -h -t tmpfs Filesystem Size Used Avail Use% Mounted on tmpfs 1.9G 1.7M 1.9G 1% /dev/shm tmpfs 1.9G 1.6M 1.9G 1% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup tmpfs 1.9G 80K 1.9G 1% /tmp tmpfs 376M 20K 376M 1% /run/user/42 tmpfs 376M 20K 376M 1% /run/user/1000 

Limit your tmpfs mounts in order to survive your problems, gain the opportunity to analyze them, and maybe even trigger a helpful error message from the software which fills them.

By default, each tmpfs is limited to 1/2 of available RAM.

It is therefore desirable not to proliferate multiple tmpfs mounts with default limits. Distributions aren't quite as good at this as they should be, as you can see above, for my 4GB system.

Apparently it's possible to change the limits at runtime with mount -oremount,size=10% /tmp.

You can put your remount commands somewhere they will run at boot time, e.g. /etc/rc.local (may require systemctl enable rc-local). Note /run/user/* are likely mounted after your script runs; hopefully they have sensible limits already.

The default tmpfs mounts tend not to be listed in /etc/fstab. Under systemd, you can modify the /tmp mount with e.g. systemctl edit tmp.mount. Otherwise, you could grep through your system scripts to find where they mount /tmp; it may use a configuration file you can edit. Another valid option for /tmp would be to disable the tmpfs mount altogether (systemctl disable tmp.mount), just letting programs write to the root filesystem instead.

1
  • Thank you for your response, but it appears that my tmpfs are not causing my problems. I updated my question with to output of df. Commented Apr 30, 2016 at 18:36

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.