Univention Bugzilla – Bug 49614
Memory leak in linux-4.9.168-1 (4.9.0-9 amd64)
Last modified: 2022-02-22 16:06:47 CET
Public report: <https://help.univention.com/t/memoryleak-auf-slave-contoller/11892> Affected internal system: mysig I suspected a buffer_head leak, but the value turned down again: # grep ^buffer_head slabinfo.2019-0* slabinfo.2019-05-28:13:37:17:buffer_head 9235 25584 104 39 1 : tunables 120 60 8 : slabdata 656 656 0 slabinfo.2019-05-28:13:38:41:buffer_head 9509 25584 104 39 1 : tunables 120 60 8 : slabdata 656 656 0 slabinfo.2019-05-29:11:43:54:buffer_head 5790 23127 104 39 1 : tunables 120 60 8 : slabdata 593 593 0 slabinfo.2019-06-01:03:10:27:buffer_head 4078 16458 104 39 1 : tunables 120 60 8 : slabdata 422 422 0 slabinfo.2019-06-07:10:12:47:buffer_head 30225 30225 104 39 1 : tunables 120 60 8 : slabdata 775 775 0 slabinfo.2019-06-07:14:25:02:buffer_head 30787 30810 104 39 1 : tunables 120 60 8 : slabdata 790 790 0 slabinfo.2019-06-07:14:46:21:buffer_head 4593 22074 104 39 1 : tunables 120 60 8 : slabdata 566 566 0 Maybe <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v4.9.180&id=056066d8a769aa87f280f00c6c2d802ecc2c7a6b> fixes it? No other idea from: git log --grep leak v4.9.168..v4.9.174 No Debian report yet. No reproducer yet. Stopping all processes did not fix it: the memory was still gone. <https://www.kernel.org/doc/html/latest/dev-tools/kmemleak.html> cannot be used as Debian kernels are compiled without CONFIG_DEBUG_KMEMLEAK
Currently happening for a customer on both master and backup every couple of days until the next reboot. Do you need any details from there?
On a system where it is currently (and reproducible) happening with a load of 112 (!!!) due to massive swapping: smem: [...] samba 460.0K vmtoolsd 622.0K smbstatus 938.0K sshd 958.0K smbstatus 1.1M univention-mana 1.2M systemd 1.2M top 1.3M dockerd 2.0M slapd 2.1M apt-get 3.0M apt-get 3.7M samba 4.4M iotop 5.3M smem 6.1M ---------------------------------- 39.7M free: total used free shared buff/cache available Mem: 2033684 854784 462676 28324 716224 998104 Swap: 2097148 0 2097148 Total: 4130832 854784 2559824 top: top - 10:19:09 up 4 days, 20:40, 3 users, load average: 112,06, 111,67, 117,23 Tasks: 326 total, 1 running, 325 sleeping, 0 stopped, 0 zombie %Cpu(s): 3,4 us, 18,9 sy, 0,0 ni, 0,0 id, 76,3 wa, 0,0 hi, 1,4 si, 0,0 st KiB Mem : 2033684 total, 68240 free, 1920544 used, 44900 buff/cache KiB Swap: 2097148 total, 1351752 free, 745396 used. 15400 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 78113 root 20 0 103224 17256 6584 D 3,4 0,8 0:20.70 apt-get -o DPkg::Options::=--force-confold -o DPkg::Options::=--force-overwrite -o DPkg::Options::=--force-overwrite-dir --trivial-only=no --assume-yes --quie t=1 -us d+ 90512 root 20 0 55896 7980 1284 S 3,5 0,4 29:15.57 /usr/bin/python3 /usr/sbin/iotop 67417 root 20 0 89668 2768 516 D 0,9 0,1 1:45.12 apt-get check -qq 9202 root 20 0 38336 1708 1252 R 0,0 0,1 0:18.21 top icbd 600 1 root 20 0 204880 1668 1364 S 0,1 0,1 3:02.91 /sbin/init splash 76894 root 20 0 54396 1400 1208 D 0,3 0,1 0:03.13 /lib/systemd/systemd-journald 66954 root 20 0 167228 1352 596 D 0,0 0,1 0:06.85 /usr/bin/python2.7 /usr/lib/univention-pam/ldap-group-to-file.py --check_member vmstat: procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- -----timestamp----- r b swpd free buff cache si so bi bo in cs us sy id wa st CEST 0 0 664588 67684 11020 138304 194 72 312 159 188 593 5 5 86 4 0 2019-06-14 14:19:18 1 0 655032 76488 7528 105064 181 154 368 218 190 586 5 5 85 5 0 2019-06-14 14:29:18 0 0 706180 79472 8908 106864 186 253 698 315 212 632 6 5 83 6 0 2019-06-14 14:39:18 0 0 717104 79228 7648 102576 197 198 494 262 204 626 6 5 85 5 0 2019-06-14 14:49:18 0 0 731224 80884 3852 94064 201 205 496 270 207 624 5 5 85 5 0 2019-06-14 14:59:18
Raising as customer needs a fix as the system is not useable this way. tunnel access can be granted, if needed.
Created attachment 10090 [details] collectd: memory usage statistics A self-compiled 4.9.174 does not show the increasing memory usage after ½ day, but we need a longer look. Please note that swap still increases, so the bug might only be fixed partly: <http://xen1.knut.univention.de/collectd/bin/index.cgi?hostname=mysig.pingst.univention.de&plugin=memory&plugin=swap×pan=86400&action=show_selection&ok_button=OK> I filed a Debian bug for this: <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=931111>
> I filed a Debian bug for this: > <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=931111> There already was a bug in Debian, which looks similar: <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912411> Looking deeper I back-ported a patch to better diagnose the CGroups: <https://www.spinics.net/lists/cgroups/msg20611.html> 0679dee03c6d7 cgroup: keep track of number of descent cgroups 1a926e0bbab83 cgroup: implement hierarchy limits 4dcabece4c3a9 cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock > nr_dying_descendants > This is the total number of dying descendant cgroups under‐ > neath this cgroup. A cgroup enters the dying state after > being deleted. It remains in that state for an undefined > period (which will depend on system load) while resources are > freed before the cgroup is destroyed. Note that the presence > of some cgroups in the dying state is normal, and is not > indicative of any problem. > > A process can't be made a member of a dying cgroup, and a > dying cgroup can't be brought back to life. While looking for further ideas I stumbled upon the following patch series, which is not yet applied upstream, but describes a problem with the current CG implementation: <https://lwn.net/Articles/782633/> This lead to <linux/Documentation/cgroup-v1/memory.txt>: As CONFIG_MEMCG_KMEM is enabled, the Linux kernel keeps track of which CG allocated which kernel memory regions. This includes: * stack pages * slab pages * sockets memory pressure * TCP memory pressure Especially SLAB/SLUB allocated objects can survive the livetime of the processes from a CG. Furthermore this changes the SLAB allocator to allocate pages on a per CG basis. When 'libpam_systemd' deletes the per-session CG, the CG is not deleted immediately, but marked as "dying": # find /sys/fs/cgroup/memory -name cgroup.stat -exec grep '^nr_dying_descendants [^0]' {} + /sys/fs/cgroup/memory/cgroup.stat:nr_dying_descendants 360 /sys/fs/cgroup/memory/user.slice/cgroup.stat:nr_dying_descendants 320 /sys/fs/cgroup/memory/user.slice/user-0.slice/cgroup.stat:nr_dying_descendants 303 /sys/fs/cgroup/memory/system.slice/cgroup.stat:nr_dying_descendants 40 # grep ^memory /proc/cgroups memory 10 452 1 So the system has a high number of dying CGs. Jürn also observed that on our system, were we had ~88k dying CGs after one week of uptime. As a first test we disabled 'pam_systemd' on the affected system: # grep pam_systemd.so /etc/pam.d/* /etc/pam.d/common-session:#session optional pam_systemd.so The test is still running, but that looks promising. Tracking the kernel memory can be disabled by setting the following Linux kernel command line argument: cgroup.memory=nokmem My test is still running, but that does not seem to fix it. We can also try to patch systemd/src/basic/cgroup-util.c:731 cg_trim() to flush the memory before the 'rmdir()': # echo 0 >/sys/fs/cgroup/memory/.../memory.force_empty
Feedback to <https://www.spinics.net/lists/cgroups/msg22853.html> from Roman Gushchin and Ben Hutchings: - Fixed in Linux 5.[23] - Upstream patch-set at <https://lwn.net/ml/linux-kernel/20190611231813.3148843-1-guro@fb.com/> - Alternative hack at <https://patchwork.kernel.org/cover/10772277/>
Some patches of interest: 6631142229005e1b1c311a09efe9fb3cfdac8559 blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration [v5.3-rc1~164^2~108] 68600f623d69da428c6163275f97ca126e1a8ec5 mm: don't miss the last page because of round-off error [v4.20-rc1~92^2~112] 172b06c32b949759fe6313abec514bc4f15014f4 mm: slowly shrink slabs with a relatively small number of objects [v4.19-rc5~13] aeed1d325d429ac9699c4bf62d17156d60905519 mm/vmscan.c: generalize shrink_slab() calls in shrink_node() [v4.19-rc1~105^2~29] 9092c71bb724dba2ecba849eae69e5c9d39bd3d2 mm: use sc->priority for slab shrink targets [v4.16-rc1~122^2~80]
No priority at the moment
Not reported or observed again. UCS 4.4-8 has 4.9.272 by now show probably fixed. Most probably caused by Bug 49910 and fixed by 4.4-1+e191