Univention Bugzilla – Bug 49382
Kernel throws tons of "enable_cpucache failed" messages with 32bit architecture - probably pam_systemd related
Last modified: 2021-05-14 16:34:24 CEST
Since the last errata update (from 4.3 errata 452 to 479), the following messages are regularly displayed in the syslog on the last remaining 32-bit systems. The single line "enable_cpucache failed for" messages are repeated millions of times over. Apr 23 19:20:01 ucs1 kernel: [224503.572940] enable_cpucache failed for files_cache(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572944] enable_cpucache failed for pid(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572948] enable_cpucache failed for sighand_cache(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572951] enable_cpucache failed for signal_cache(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572955] enable_cpucache failed for mm_struct(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572960] enable_cpucache failed for kmalloc-32(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572964] enable_cpucache failed for vm_area_struct(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572968] enable_cpucache failed for vm_area_struct(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572973] enable_cpucache failed for anon_vma_chain(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572977] enable_cpucache failed for anon_vma_chain(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572982] enable_cpucache failed for anon_vma(373877:session-c10530.scope), error 12 Apr 23 19:20:01 ucs1 kernel: [224503.572987] enable_cpucache failed for anon_vma_chain(373877:session-c10530.scope), error 12 uname -a Linux dc011 4.9.0-8-686-pae #1 SMP Debian 4.9.144-3 (2019-02-02) i686 GNU/Linux Maybe this is related to: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884763 Do you need further information? Is there a workaround to reduce the flood of messages?
These messages came before: Apr 23 19:20:01 ucs1 kernel: [224503.571767] percpu: allocation failed, size=496 align=4 atomic=0, failed to allocate new chunk Apr 23 19:20:01 ucs1 kernel: [224503.571771] CPU: 2 PID: 20925 Comm: kworker/u16:2 Not tainted 4.9.0-8-686-pae #1 Debian 4.9.144-3 Apr 23 19:20:01 ucs1 kernel: [224503.571771] Hardware name: exone System Product Name/P8H67-M, BIOS 0605 03/07/2011 Apr 23 19:20:01 ucs1 kernel: [224503.571775] Workqueue: memcg_kmem_cache_create memcg_kmem_cache_create_func Apr 23 19:20:01 ucs1 kernel: [224503.571777] c86f5ddc c52fded2 00000000 f79fac78 c5191233 c56ce200 000001f0 00000004 Apr 23 19:20:01 ucs1 kernel: [224503.571782] 00000000 c56d265b 00000008 00d3da80 00200246 c86f5dc0 00000004 00000010 Apr 23 19:20:01 ucs1 kernel: [224503.571785] 024000c0 00000000 00000000 c86f5e00 c51c65da f7545d65 c2d3df00 00000078 Apr 23 19:20:01 ucs1 kernel: [224503.571789] Call Trace: Apr 23 19:20:01 ucs1 kernel: [224503.571794] [<c52fded2>] ? dump_stack+0x55/0x73 Apr 23 19:20:01 ucs1 kernel: [224503.571797] [<c5191233>] ? pcpu_alloc+0x583/0x5b0 Apr 23 19:20:01 ucs1 kernel: [224503.571800] [<c51c65da>] ? setup_kmem_cache_node+0x6a/0x1a0 Apr 23 19:20:01 ucs1 kernel: [224503.571801] [<c5191299>] ? __alloc_percpu+0x19/0x20 Apr 23 19:20:01 ucs1 kernel: [224503.571803] [<c51c38a3>] ? alloc_kmem_cache_cpus.isra.50+0x23/0x90 Apr 23 19:20:01 ucs1 kernel: [224503.571805] [<c51c6738>] ? __do_tune_cpucache+0x28/0x120 Apr 23 19:20:01 ucs1 kernel: [224503.571807] [<c51c6859>] ? do_tune_cpucache+0x29/0x90 Apr 23 19:20:01 ucs1 kernel: [224503.571809] [<c51c694d>] ? enable_cpucache+0x8d/0x120 Apr 23 19:20:01 ucs1 kernel: [224503.571811] [<c51c6db5>] ? __kmem_cache_create+0x155/0x1e0 Apr 23 19:20:01 ucs1 kernel: [224503.571813] [<c5192f78>] ? create_cache+0x68/0x110 Apr 23 19:20:01 ucs1 kernel: [224503.571815] [<c5193b37>] ? memcg_create_kmem_cache+0xd7/0x120 Apr 23 19:20:01 ucs1 kernel: [224503.571816] [<c51d3c1a>] ? memcg_kmem_cache_create_func+0x1a/0x60 Apr 23 19:20:01 ucs1 kernel: [224503.571820] [<c5081e86>] ? process_one_work+0x146/0x390 Apr 23 19:20:01 ucs1 kernel: [224503.571822] [<c507fce3>] ? pwq_activate_delayed_work+0x33/0x90 Apr 23 19:20:01 ucs1 kernel: [224503.571824] [<c5082109>] ? worker_thread+0x39/0x480 Apr 23 19:20:01 ucs1 kernel: [224503.571825] [<c5087427>] ? kthread+0xb7/0xd0 Apr 23 19:20:01 ucs1 kernel: [224503.571827] [<c50820d0>] ? process_one_work+0x390/0x390 Apr 23 19:20:01 ucs1 kernel: [224503.571830] [<c55bfce7>] ? __switch_to_asm+0x27/0x4c Apr 23 19:20:01 ucs1 kernel: [224503.571832] [<c5087370>] ? kthread_park+0x50/0x50 Apr 23 19:20:01 ucs1 kernel: [224503.571834] [<c55bfd3c>] ? ret_from_fork+0x30/0x3c Apr 23 19:20:01 ucs1 kernel: [224503.571835] enable_cpucache failed for pid(373913:session-c10533.scope), error 12 [...] Apr 23 19:20:01 ucs1 kernel: [224503.572333] percpu: allocation failed, size=496 align=4 atomic=0, failed to allocate new chunk Apr 23 19:20:01 ucs1 kernel: [224503.572334] CPU: 2 PID: 20925 Comm: kworker/u16:2 Not tainted 4.9.0-8-686-pae #1 Debian 4.9.144-3 Apr 23 19:20:01 ucs1 kernel: [224503.572335] Hardware name: exone System Product Name/P8H67-M, BIOS 0605 03/07/2011 Apr 23 19:20:01 ucs1 kernel: [224503.572336] Workqueue: memcg_kmem_cache_create memcg_kmem_cache_create_func Apr 23 19:20:01 ucs1 kernel: [224503.572337] c86f5ddc c52fded2 00000000 f79fac78 c5191233 c56ce200 000001f0 00000004 Apr 23 19:20:01 ucs1 kernel: [224503.572340] 00000000 c56d265b 00000008 00000000 00200246 fffffffa 00000004 00000010 Apr 23 19:20:01 ucs1 kernel: [224503.572344] 024000c0 00000000 00000000 00200247 0001b458 f7545d65 c2d3d380 00000078 Apr 23 19:20:01 ucs1 kernel: [224503.572348] Call Trace: Apr 23 19:20:01 ucs1 kernel: [224503.572349] [<c52fded2>] ? dump_stack+0x55/0x73 Apr 23 19:20:01 ucs1 kernel: [224503.572351] [<c5191233>] ? pcpu_alloc+0x583/0x5b0 Apr 23 19:20:01 ucs1 kernel: [224503.572353] [<c5191299>] ? __alloc_percpu+0x19/0x20 Apr 23 19:20:01 ucs1 kernel: [224503.572354] [<c51c38a3>] ? alloc_kmem_cache_cpus.isra.50+0x23/0x90 Apr 23 19:20:01 ucs1 kernel: [224503.572356] [<c51c6738>] ? __do_tune_cpucache+0x28/0x120 Apr 23 19:20:01 ucs1 kernel: [224503.572358] [<c51c6859>] ? do_tune_cpucache+0x29/0x90 Apr 23 19:20:01 ucs1 kernel: [224503.572360] [<c51c694d>] ? enable_cpucache+0x8d/0x120 Apr 23 19:20:01 ucs1 kernel: [224503.572361] [<c51c6db5>] ? __kmem_cache_create+0x155/0x1e0 Apr 23 19:20:01 ucs1 kernel: [224503.572363] [<c5192f78>] ? create_cache+0x68/0x110 Apr 23 19:20:01 ucs1 kernel: [224503.572365] [<c5193b37>] ? memcg_create_kmem_cache+0xd7/0x120 Apr 23 19:20:01 ucs1 kernel: [224503.572366] [<c51d3c1a>] ? memcg_kmem_cache_create_func+0x1a/0x60 Apr 23 19:20:01 ucs1 kernel: [224503.572368] [<c5081e86>] ? process_one_work+0x146/0x390 Apr 23 19:20:01 ucs1 kernel: [224503.572370] [<c507fce3>] ? pwq_activate_delayed_work+0x33/0x90 Apr 23 19:20:01 ucs1 kernel: [224503.572372] [<c5082109>] ? worker_thread+0x39/0x480 Apr 23 19:20:01 ucs1 kernel: [224503.572374] [<c5087427>] ? kthread+0xb7/0xd0 Apr 23 19:20:01 ucs1 kernel: [224503.572376] [<c50820d0>] ? process_one_work+0x390/0x390 Apr 23 19:20:01 ucs1 kernel: [224503.572377] [<c55bfce7>] ? __switch_to_asm+0x27/0x4c Apr 23 19:20:01 ucs1 kernel: [224503.572379] [<c5087370>] ? kthread_park+0x50/0x50 Apr 23 19:20:01 ucs1 kernel: [224503.572381] [<c55bfd3c>] ? ret_from_fork+0x30/0x3c Apr 23 19:20:01 ucs1 kernel: [224503.572381] percpu: limit reached, disable warning
The system is out-of-memory when allocating kernel internal data structures. 686-pae is prone to this kind of errors as too much management information must be packed into only 1 GiB of RAM available to the kernel. If too many processes are launched or too many block devices are used this starts to happen. There is an option in the Linux Kernel build system to change the default 3G/1G to 2G/2G split between user-space/kernel-space, but that requires a kernel re-build and has other draw-backs, so this is not done by Debian (and us). There even was a patch to implement a 4G/4G split, but it was never applied upstream and is now largely out-of-date¹. x86_64 is available since 2002 and nobody in the Linux community really cares about that "32 bit hack called pae" any more. Therefore the default answer applies: Migrate to x86_64 as soon as possible and be done with such kind of problems. (you could even install the x86_64 kernel keep the 32-bit user-space to mitigate this issue, but this is not supported by UCS in any way.) You may ask the customer (for "fun") how many (virtual) CPUs and how much RAM the system has. The further this is from 4 CPUs and 4 GiB of RAM the more likely is this problem to appear. ¹<https://flaterco.com/kb/PAE_slowdown.html>
cat /proc/slabinfo ... # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> files_cache 267 976 256 16 1 : tunables 120 60 8 : slabdata 61 61 0 The culprit is probably systemd: With Bug #49045 (Erratum 4.3 464) we enabled 'pam_systemd.so' which leads to <https://github.com/systemd/systemd/issues/8598>: Each user get a lingering "systemd --user" and "(sd-pam)" processes, which consume additional kernel space. Eventually this leads to allocation failures inside the kernel. (The affected systems have 2 and 4 GiB, so not the Highmem issues from comment 2, but probably more related to just too few kernel memory in gernal.) Probable workaround: disable 'pam_systemd' for now and kill the lingering processes: find /etc/pam.d /etc/univention/templates/files/etc/pam.d -type f -exec grep --color pam_systemd.so {} + pkill -f '/lib/systemd/systemd --user' 
(In reply to Philipp Hahn from comment #3) > Probable workaround: disable 'pam_systemd' for now and kill the lingering > processes: > find /etc/pam.d /etc/univention/templates/files/etc/pam.d -type f -exec > grep --color pam_systemd.so {} + find /etc/pam.d /etc/univention/templates/files/etc/pam.d -type f -exec sed -i '/pam_systemd.so/s/^/#/' {} +  
This issue has been filed against UCS 4.3. UCS 4.3 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed. If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.