Bug 49382 - Kernel throws tons of "enable_cpucache failed" messages with 32bit architecture - probably pam_systemd related
Kernel throws tons of "enable_cpucache failed" messages with 32bit architectu...
Status: RESOLVED WONTFIX
Product: UCS
Classification: Unclassified
Component: PAM
UCS 4.3
i386 Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-04-29 11:52 CEST by Christina Scheinig
Modified: 2021-05-14 16:34 CEST (History)
2 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 3: Simply Wrong: The implementation doesn't match the docu
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 1: Nuisance – not a big deal but noticeable
User Pain: 0.017
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019042921000285
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christina Scheinig univentionstaff 2019-04-29 11:52:04 CEST
Since the last errata update (from 4.3 errata 452 to 479), the following messages are regularly displayed in the syslog on the last remaining 32-bit systems. The single line "enable_cpucache failed for" messages
are repeated millions of times over.

Apr 23 19:20:01 ucs1 kernel: [224503.572940] enable_cpucache failed for files_cache(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572944] enable_cpucache failed for pid(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572948] enable_cpucache failed for sighand_cache(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572951] enable_cpucache failed for signal_cache(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572955] enable_cpucache failed for mm_struct(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572960] enable_cpucache failed for kmalloc-32(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572964] enable_cpucache failed for vm_area_struct(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572968] enable_cpucache failed for vm_area_struct(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572973] enable_cpucache failed for anon_vma_chain(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572977] enable_cpucache failed for anon_vma_chain(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572982] enable_cpucache failed for anon_vma(373877:session-c10530.scope), error 12
Apr 23 19:20:01 ucs1 kernel: [224503.572987] enable_cpucache failed for anon_vma_chain(373877:session-c10530.scope), error 12

uname -a
Linux dc011 4.9.0-8-686-pae #1 SMP Debian 4.9.144-3 (2019-02-02) i686 GNU/Linux

Maybe this is related to:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884763

Do you need further information?
Is there a workaround to reduce the flood of messages?
Comment 1 Christina Scheinig univentionstaff 2019-04-29 13:11:46 CEST
These messages came before:

Apr 23 19:20:01 ucs1 kernel: [224503.571767] percpu: allocation failed, size=496 align=4 atomic=0, failed to allocate new chunk
Apr 23 19:20:01 ucs1 kernel: [224503.571771] CPU: 2 PID: 20925 Comm: kworker/u16:2 Not tainted 4.9.0-8-686-pae #1 Debian 4.9.144-3
Apr 23 19:20:01 ucs1 kernel: [224503.571771] Hardware name: exone System Product Name/P8H67-M, BIOS 0605 03/07/2011
Apr 23 19:20:01 ucs1 kernel: [224503.571775] Workqueue: memcg_kmem_cache_create memcg_kmem_cache_create_func
Apr 23 19:20:01 ucs1 kernel: [224503.571777]  c86f5ddc c52fded2 00000000 f79fac78 c5191233 c56ce200 000001f0 00000004
Apr 23 19:20:01 ucs1 kernel: [224503.571782]  00000000 c56d265b 00000008 00d3da80 00200246 c86f5dc0 00000004 00000010
Apr 23 19:20:01 ucs1 kernel: [224503.571785]  024000c0 00000000 00000000 c86f5e00 c51c65da f7545d65 c2d3df00 00000078
Apr 23 19:20:01 ucs1 kernel: [224503.571789] Call Trace:
Apr 23 19:20:01 ucs1 kernel: [224503.571794]  [<c52fded2>] ? dump_stack+0x55/0x73
Apr 23 19:20:01 ucs1 kernel: [224503.571797]  [<c5191233>] ? pcpu_alloc+0x583/0x5b0
Apr 23 19:20:01 ucs1 kernel: [224503.571800]  [<c51c65da>] ? setup_kmem_cache_node+0x6a/0x1a0
Apr 23 19:20:01 ucs1 kernel: [224503.571801]  [<c5191299>] ? __alloc_percpu+0x19/0x20
Apr 23 19:20:01 ucs1 kernel: [224503.571803]  [<c51c38a3>] ? alloc_kmem_cache_cpus.isra.50+0x23/0x90
Apr 23 19:20:01 ucs1 kernel: [224503.571805]  [<c51c6738>] ? __do_tune_cpucache+0x28/0x120
Apr 23 19:20:01 ucs1 kernel: [224503.571807]  [<c51c6859>] ? do_tune_cpucache+0x29/0x90
Apr 23 19:20:01 ucs1 kernel: [224503.571809]  [<c51c694d>] ? enable_cpucache+0x8d/0x120
Apr 23 19:20:01 ucs1 kernel: [224503.571811]  [<c51c6db5>] ? __kmem_cache_create+0x155/0x1e0
Apr 23 19:20:01 ucs1 kernel: [224503.571813]  [<c5192f78>] ? create_cache+0x68/0x110
Apr 23 19:20:01 ucs1 kernel: [224503.571815]  [<c5193b37>] ? memcg_create_kmem_cache+0xd7/0x120
Apr 23 19:20:01 ucs1 kernel: [224503.571816]  [<c51d3c1a>] ? memcg_kmem_cache_create_func+0x1a/0x60
Apr 23 19:20:01 ucs1 kernel: [224503.571820]  [<c5081e86>] ? process_one_work+0x146/0x390
Apr 23 19:20:01 ucs1 kernel: [224503.571822]  [<c507fce3>] ? pwq_activate_delayed_work+0x33/0x90
Apr 23 19:20:01 ucs1 kernel: [224503.571824]  [<c5082109>] ? worker_thread+0x39/0x480
Apr 23 19:20:01 ucs1 kernel: [224503.571825]  [<c5087427>] ? kthread+0xb7/0xd0
Apr 23 19:20:01 ucs1 kernel: [224503.571827]  [<c50820d0>] ? process_one_work+0x390/0x390
Apr 23 19:20:01 ucs1 kernel: [224503.571830]  [<c55bfce7>] ? __switch_to_asm+0x27/0x4c
Apr 23 19:20:01 ucs1 kernel: [224503.571832]  [<c5087370>] ? kthread_park+0x50/0x50
Apr 23 19:20:01 ucs1 kernel: [224503.571834]  [<c55bfd3c>] ? ret_from_fork+0x30/0x3c
Apr 23 19:20:01 ucs1 kernel: [224503.571835] enable_cpucache failed for pid(373913:session-c10533.scope), error 12
[...]
Apr 23 19:20:01 ucs1 kernel: [224503.572333] percpu: allocation failed, size=496 align=4 atomic=0, failed to allocate new chunk
Apr 23 19:20:01 ucs1 kernel: [224503.572334] CPU: 2 PID: 20925 Comm: kworker/u16:2 Not tainted 4.9.0-8-686-pae #1 Debian 4.9.144-3
Apr 23 19:20:01 ucs1 kernel: [224503.572335] Hardware name: exone System Product Name/P8H67-M, BIOS 0605 03/07/2011
Apr 23 19:20:01 ucs1 kernel: [224503.572336] Workqueue: memcg_kmem_cache_create memcg_kmem_cache_create_func
Apr 23 19:20:01 ucs1 kernel: [224503.572337]  c86f5ddc c52fded2 00000000 f79fac78 c5191233 c56ce200 000001f0 00000004
Apr 23 19:20:01 ucs1 kernel: [224503.572340]  00000000 c56d265b 00000008 00000000 00200246 fffffffa 00000004 00000010
Apr 23 19:20:01 ucs1 kernel: [224503.572344]  024000c0 00000000 00000000 00200247 0001b458 f7545d65 c2d3d380 00000078
Apr 23 19:20:01 ucs1 kernel: [224503.572348] Call Trace:
Apr 23 19:20:01 ucs1 kernel: [224503.572349]  [<c52fded2>] ? dump_stack+0x55/0x73
Apr 23 19:20:01 ucs1 kernel: [224503.572351]  [<c5191233>] ? pcpu_alloc+0x583/0x5b0
Apr 23 19:20:01 ucs1 kernel: [224503.572353]  [<c5191299>] ? __alloc_percpu+0x19/0x20
Apr 23 19:20:01 ucs1 kernel: [224503.572354]  [<c51c38a3>] ? alloc_kmem_cache_cpus.isra.50+0x23/0x90
Apr 23 19:20:01 ucs1 kernel: [224503.572356]  [<c51c6738>] ? __do_tune_cpucache+0x28/0x120
Apr 23 19:20:01 ucs1 kernel: [224503.572358]  [<c51c6859>] ? do_tune_cpucache+0x29/0x90
Apr 23 19:20:01 ucs1 kernel: [224503.572360]  [<c51c694d>] ? enable_cpucache+0x8d/0x120
Apr 23 19:20:01 ucs1 kernel: [224503.572361]  [<c51c6db5>] ? __kmem_cache_create+0x155/0x1e0
Apr 23 19:20:01 ucs1 kernel: [224503.572363]  [<c5192f78>] ? create_cache+0x68/0x110
Apr 23 19:20:01 ucs1 kernel: [224503.572365]  [<c5193b37>] ? memcg_create_kmem_cache+0xd7/0x120
Apr 23 19:20:01 ucs1 kernel: [224503.572366]  [<c51d3c1a>] ? memcg_kmem_cache_create_func+0x1a/0x60
Apr 23 19:20:01 ucs1 kernel: [224503.572368]  [<c5081e86>] ? process_one_work+0x146/0x390
Apr 23 19:20:01 ucs1 kernel: [224503.572370]  [<c507fce3>] ? pwq_activate_delayed_work+0x33/0x90
Apr 23 19:20:01 ucs1 kernel: [224503.572372]  [<c5082109>] ? worker_thread+0x39/0x480
Apr 23 19:20:01 ucs1 kernel: [224503.572374]  [<c5087427>] ? kthread+0xb7/0xd0
Apr 23 19:20:01 ucs1 kernel: [224503.572376]  [<c50820d0>] ? process_one_work+0x390/0x390
Apr 23 19:20:01 ucs1 kernel: [224503.572377]  [<c55bfce7>] ? __switch_to_asm+0x27/0x4c
Apr 23 19:20:01 ucs1 kernel: [224503.572379]  [<c5087370>] ? kthread_park+0x50/0x50
Apr 23 19:20:01 ucs1 kernel: [224503.572381]  [<c55bfd3c>] ? ret_from_fork+0x30/0x3c
Apr 23 19:20:01 ucs1 kernel: [224503.572381] percpu: limit reached, disable warning
Comment 2 Philipp Hahn univentionstaff 2019-04-29 15:53:30 CEST
The system is out-of-memory when allocating kernel internal data structures.

686-pae is prone to this kind of errors as too much management information must be packed into only 1 GiB of RAM available to the kernel. If too many processes are launched or too many block devices are used this starts to happen.

There is an option in the Linux Kernel build system to change the default 3G/1G to 2G/2G split between user-space/kernel-space, but that requires a kernel re-build and has other draw-backs, so this is not done by Debian (and us). There even was a patch to implement a 4G/4G split, but it was never applied upstream and is now largely out-of-date¹.

x86_64 is available since 2002 and nobody in the Linux community really cares about that "32 bit hack called pae" any more. Therefore the default answer applies: Migrate to x86_64 as soon as possible and be done with such kind of problems.
(you could even install the x86_64 kernel keep the 32-bit user-space to mitigate this issue, but this is not supported by UCS in any way.)

You may ask the customer (for "fun") how many (virtual) CPUs and how much RAM the system has. The further this is from 4 CPUs and 4 GiB of RAM the more likely is this problem to appear.

¹<https://flaterco.com/kb/PAE_slowdown.html>
Comment 3 Philipp Hahn univentionstaff 2019-04-30 10:09:24 CEST
cat /proc/slabinfo
...
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
files_cache          267    976    256   16    1 : tunables  120   60    8 : slabdata     61     61      0


The culprit is probably systemd: With Bug #49045 (Erratum 4.3 464) we enabled 'pam_systemd.so' which leads to <https://github.com/systemd/systemd/issues/8598>: Each user get a lingering "systemd --user" and "(sd-pam)" processes, which consume additional kernel space. Eventually this leads to allocation failures inside the kernel.

(The affected systems have 2 and 4 GiB, so not the Highmem issues from comment 2, but probably more related to just too few kernel memory in gernal.)

Probable workaround: disable 'pam_systemd' for now and kill the lingering processes:
 find /etc/pam.d /etc/univention/templates/files/etc/pam.d -type f -exec grep --color pam_systemd.so {} +
 pkill -f '/lib/systemd/systemd --user'

Comment 4 Philipp Hahn univentionstaff 2019-04-30 10:50:20 CEST
(In reply to Philipp Hahn from comment #3)
> Probable workaround: disable 'pam_systemd' for now and kill the lingering
> processes:
>  find /etc/pam.d /etc/univention/templates/files/etc/pam.d -type f -exec
> grep --color pam_systemd.so {} +

find /etc/pam.d /etc/univention/templates/files/etc/pam.d -type f -exec sed -i '/pam_systemd.so/s/^/#/' {} +


Comment 5 Ingo Steuwer univentionstaff 2021-05-14 15:42:12 CEST
This issue has been filed against UCS 4.3.

UCS 4.3 is out of maintenance and many UCS components have changed in later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or reopen it and update the UCS version. In this case please provide detailed information on how this issue is affecting you.