Bug 41051 - kernel:[6201680.032002] BUG: soft lockup - CPU#1 stuck for 22s! [smbd:11900]
kernel:[6201680.032002] BUG: soft lockup - CPU#1 stuck for 22s! [smbd:11900]
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Kernel
UCS 3.3
Other Linux
: P5 normal (vote)
: UCS 3.3
Assigned To: Philipp Hahn
Janek Walkenhorst
:
Depends on: 41048
Blocks:
  Show dependency treegraph
 
Reported: 2016-04-14 06:24 CEST by Stefan Gohmann
Modified: 2016-06-07 21:35 CEST (History)
2 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gohmann univentionstaff 2016-04-14 06:24:11 CEST
The new kernel needs to be merged to UCS 3.3.

+++ This bug was initially created as a clone of Bug #41048 +++

Ticket #2015121821000574

In a customer environment the following kernel trace happens from time to time:

Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] BUG: soft lockup - CPU#0 stuck for 23s! [smbd:14256]
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] Modules linked in: cpuid ppdev lp ip6t_REJECT ipt_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd fscache sunrpc quota_v2 quota_tree psmouse processor parport_pc i2c_piix4 parport pcspkr thermal_sys joydev serio_raw virtio_balloon evdev ext4 crc16 mbcache jbd2 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod hid_generic usbhid hid sg sr_mod cdrom ata_generic virtio_net virtio_blk uhci_hcd ehci_hcd usbcore floppy usb_common ata_piix ttm drm_kms_helper drm libata i2c_core virtio_pci virtio_ring virtio scsi_mod button
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] CPU: 0 PID: 14256 Comm: smbd Not tainted 3.16.0-ucs165-amd64 #1 Debian 3.16.7-ckt20-1+deb8u3~bpo70+1.165.201601221131
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] task: ffff8801639a8110 ti: ffff88014fb2c000 task.ti: ffff88014fb2c000
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] RIP: 0010:[<ffffffff815509fb>]  [<ffffffff815509fb>] _raw_spin_lock+0x1b/0x30
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] RSP: 0018:ffff88014fb2fe50  EFLAGS: 00000212
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] RAX: 0000000000000db6 RBX: ffffffff8119fffe RCX: 0000000100070007
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] RDX: 0000000000000d9d RSI: ffff880102405440 RDI: ffff880102405750
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] RBP: 0000000000000028 R08: 00000000570de732 R09: 0000000000000005
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] R10: ffffffffffffffff R11: 0000000000000000 R12: 000000000000006e
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] R13: 0000000400000001 R14: ffff88011b7140c8 R15: ffff880216604858
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] FS:  00007f40fd760720(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] CR2: 00007f40ff423df0 CR3: 00000001a37c4000 CR4: 00000000000006f0
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] Stack:
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006]  ffffffff814ee608 ffff8801feeaa800 0000000000000028 ffff8801feeaa800
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006]  ffffffff814f0543 ffffffff81673320 ffff88014fb2fe94 ffff880216f7b000
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006]  00000028bfb170b0 ffff8800730ef018 ffff8801bfb17080 000000000000006e
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] Call Trace:
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006]  [<ffffffff814ee608>] ? unix_state_double_lock+0x28/0x70
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006]  [<ffffffff814f0543>] ? unix_dgram_connect+0x93/0x250
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006]  [<ffffffff8143b128>] ? SYSC_connect+0xe8/0x100
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006]  [<ffffffff81550f8d>] ? system_call_fast_compare_end+0x10/0x15
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] Code: b8 01 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 03 c3 f3 90 <0f> b7 07 66 39 d0 75 f6 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00

The trace is logged a lot:
root@ucs-server01:~# grep stuck /var/log/syslog 
Apr 13 08:29:54 ucs-server01 kernel: [6190608.036006] BUG: soft lockup - CPU#0 stuck for 23s! [smbd:14256]
Apr 13 08:30:22 ucs-server01 kernel: [6190636.036008] BUG: soft lockup - CPU#0 stuck for 23s! [smbd:14256]
Apr 13 08:30:54 ucs-server01 kernel: [6190668.036006] BUG: soft lockup - CPU#0 stuck for 22s! [smbd:14256]
Apr 13 08:31:22 ucs-server01 kernel: [6190696.036006] BUG: soft lockup - CPU#0 stuck for 22s! [smbd:14256]
Apr 13 08:31:58 ucs-server01 kernel: [6190732.036007] BUG: soft lockup - CPU#0 stuck for 22s! [smbd:14256]
[...]
Apr 13 10:10:06 ucs-server01 kernel: [6196620.036006] BUG: soft lockup - CPU#0 stuck for 23s! [smbd:14256]
Apr 13 10:10:42 ucs-server01 kernel: [6196656.036007] BUG: soft lockup - CPU#0 stuck for 22s! [smbd:14256]
Apr 13 10:11:10 ucs-server01 kernel: [6196684.036010] BUG: soft lockup - CPU#0 stuck for 22s! [smbd:14256]
Apr 13 10:11:46 ucs-server01 kernel: [6196720.036006] BUG: soft lockup - CPU#0 stuck for 23s! [smbd:14256]
Apr 13 10:12:15 ucs-server01 kernel: [6196748.036007] BUG: soft lockup - CPU#0 stuck for 23s! [smbd:14256]

And then, the server gets stuck.

Workaround: Downgrade to Kernel 3.10.
Comment 1 Philipp Hahn univentionstaff 2016-04-15 16:22:57 CEST
r16415 | Bug #41051: linux-3.16.7-ckt25-2~bpo70+1 UCS-3.3-0

$ repo_admin.py --cherrypick -r 4.0 -s errata4.0-5 --releasedest 3.3 -p linux --ignore-patches
Comment 2 Philipp Hahn univentionstaff 2016-04-15 18:53:15 CEST
Package: linux
Version: 3.16.7-ckt25-2~bpo70+1~ucs3.3.192.201604151610
Branch: ucs_3.3-0
Comment 3 Philipp Hahn univentionstaff 2016-04-19 08:58:42 CEST
Fix file-system-corruption with KVM (Ticket#2016041221000419 <https://bugzilla.kernel.org/show_bug.cgi?id=102731> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818502>)

r16416 | Bug #40838,Bug #41051: linux file-system corruption

Package: linux
Version: 3.16.7-ckt25-2~bpo70+1~ucs3.3.193.201604181018
Branch: ucs_3.3-0

OK: amd64 @ kvm
OK: i386 @ kvm
OK: zless /usr/share/doc/linux-image-`uname -r`/changelog.Debian.gz

r68749 | Bug #41051: Update to 3.16.7-ckt25-2~bpo70+1
r68750 | Bug #41051: Update to 3.16.7-ckt25-2~bpo70+1

Package: univention-kernel-image
Version: 7.100.0-6.99.201604190855
Branch: ucs_3.3-0
Comment 4 Janek Walkenhorst univentionstaff 2016-05-25 18:23:29 CEST
Tests (kvm & hardware): OK
Comment 5 Stefan Gohmann univentionstaff 2016-06-07 21:35:47 CEST
UCS 3.3 has been released:
 https://docs.software-univention.de/release-notes-3.3-0-en.html
 https://docs.software-univention.de/release-notes-3.3-0-de.html

If this error occurs again, please use "Clone This Bug".