Univention Bugzilla – Full Text Bug Listing |
Summary: | NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [smbd:4615] | ||
---|---|---|---|
Product: | UCS | Reporter: | Stefan Gohmann <gohmann> |
Component: | Kernel | Assignee: | Philipp Hahn <hahn> |
Status: | CLOSED FIXED | QA Contact: | Felix Botner <botner> |
Severity: | critical | ||
Priority: | P5 | CC: | andree.hingst, damrose, grandjean, hahn, salm, scheinig, stephan.hendl, stoeckigt, walkenhorst |
Version: | UCS 4.1 | ||
Target Milestone: | UCS 4.1-0-errata | ||
Hardware: | Other | ||
OS: | Linux | ||
See Also: | https://forge.univention.org/bugzilla/show_bug.cgi?id=41048 | ||
What kind of report is it?: | --- | What type of bug is this?: | --- |
Who will be affected by this bug?: | --- | How will those affected feel about the bug?: | --- |
User Pain: | Enterprise Customer affected?: | ||
School Customer affected?: | ISV affected?: | ||
Waiting Support: | Flags outvoted (downgraded) after PO Review: | ||
Ticket number: | Bug group (optional): | ||
Max CVSS v3 score: | |||
Bug Depends on: | |||
Bug Blocks: | 42614, 42927 |
Description
Stefan Gohmann
2016-02-02 07:45:56 CET
Next ticket: Ticket #2016020221000432 See ticket for more logfiles: https://otrs.knut.univention.de/otrs/index.pl?Action=AgentTicketZoom;TicketID=909606;ArticleID=1832270# Probaby another case via Presales (Issue #4231) (In reply to Michael Grandjean from comment #3) > Probaby another case via Presales (Issue #4231) | "Sie sind nicht berechtigt, auf diese Seite zuzugreifen" The bug happens on both architectures: _raw_spin_lock is either 0x70(i386) or 0x50(amd64) bytes long. `smbtorture base.fdpass.fdpass` did not show anything thus far. Simple Python unix_dram_{server,cliens} are running fine also. Patch between 4.4 and 4.1 look same, but <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1586a5877db9eee313379738d6581bc7c6ffb5e3> is missing in 4.1.16? RFH @ LKML: <https://lkml.org/lkml/2016/2/2/474> Now building test kernel in 4.1.6-1-glibc with net/unix/reverted: > $ git log --oneline v4.1.12..v4.1.17 -- net/unix > dc6b0ec unix: properly account for FDs passed over unix sockets > cc01a0a af_unix: Revert 'lock_interruptible' in stream receive code > 5c77e26 unix: avoid use-after-free in ep_remove_wait_queue Plan it to look for testers, if the bug also occurs with that kernel. Request for testers We've built a patched kernel, but we don't know yet if it fixes the problem. Willing testers can include the repository containing the patched kernel and should test, if the bug still occurs. Feedback is very much appreciated. ucr set repository/online/component/glibc{=yes,/parts=unmaintained} case "$(uname -m)" in x86_64) univention-install linux-image-4.1.0-ucs170-amd64 ;; i?86) univention-install linux-image-4.1.0-ucs170-686-pae ;; esac reboot The kernel is not signed for UEFI/secured-boot! r15870 | Also build for errata4.1-0 Package: linux Version: 4.1.6-1.171.201602081336 Branch: ucs_4.1-0 Scope: errata4.1-0 happend again here: Ticket#2016021021000328 all related logs and traces within ticket. <https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1543980> for 3.13, which also has <https://lkml.org/lkml/2016/2/5/587> a back-port of: willy tarreau (1): unix: properly account for FDs passed over unix sockets Not yet able to trigger bug with proposed test "samba3.raw.composite". Prepare for UCS-4.1-1 release with net/unix/ reverted. We can't trigger the bug and have no proof that the bug is resolved. r67311 | Bug #40558 kernel: Update to ucs171 (4.1.16-unix) Package: univention-kernel-image-signed Version: 2.0.0-5.14.201602101441 Branch: ucs_4.1-0 Scope: errata4.1-0 r67312 | Bug #40558 kernel: Update to ucs171 (4.1.16-unix) Package: univention-kernel-image Version: 9.0.0-7.88.201602101444 Branch: ucs_4.1-0 Scope: errata4.1-0 r67315 | Bug #40558 kernel: Update to ucs171 (4.1.16-unix) YAML linux.yaml univention-kernel-image-signed.yaml univention-kernel-image.yaml I can reproduce this on hardware (xen15 10.201.16.16) and KVM steps: * UCS 4.1-0 4.1.0-ucs167-amd64 * ucr set repository/online/unmaintained='yes' * apt-get install bison comerr-dev debhelper docbook-xml docbook-xsl faketime flex libacl1-dev libaio-dev libattr1-dev libblkid-dev libbsd-dev libcap-dev libcups2-dev libgnutls-dev xfslibs-dev libldap2-dev libldb-dev libncurses5-dev libntdb-dev libpam0g-dev libparse-yapp-perl libpopt-dev libreadline-dev libsubunit-dev libtalloc-dev libtdb-dev libtevent-dev perl perl-modules pkg-config po-debconf python-all-dev python-dnspython python-ldb python-ldb-dev python-ntdb python-talloc-dev python-tdb python-testtools python3 subunit xsltproc zlib1g-dev * git clone git://git.samba.org/samba.git samba * cd samba * ./configure.developer * TDB_NO_FSYNC=1 make -j test FAIL_IMMEDIATELY=1 SOCKET_WRAPPER_KEEP_PCAP=1 TESTS="samba3.raw.composite" Samba test works with 4.1.0-ucs171-amd64 so far (started overnight test loop ...) But there are some strange kernel messages ACPI Warning: SystemIO range 0x0000000000000828-0x000000000000082F conflicts with OpRegion 0x0000000000000800-0x000000000000084F (\PMRG) (20150410/utaddress-254) iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround and in particularly Request for unknown module key 'Build time autogenerated kernel key: 006416f63733d99e57be1fd3a06d66c85b9e2c23' err -11 -> dmesg | grep "006416f63733d99e57be1fd3a06d66c85b9e2c23" | wc -l 194 (In reply to Felix Botner from comment #10) > I can reproduce this on hardware (xen15 10.201.16.16) and KVM > > steps: > * UCS 4.1-0 4.1.0-ucs167-amd64 > * ucr set repository/online/unmaintained='yes' > * apt-get install bison comerr-dev debhelper docbook-xml docbook-xsl > faketime flex libacl1-dev libaio-dev libattr1-dev libblkid-dev libbsd-dev > libcap-dev libcups2-dev libgnutls-dev xfslibs-dev libldap2-dev libldb-dev > libncurses5-dev libntdb-dev libpam0g-dev libparse-yapp-perl libpopt-dev > libreadline-dev libsubunit-dev libtalloc-dev libtdb-dev libtevent-dev perl > perl-modules pkg-config po-debconf python-all-dev python-dnspython > python-ldb python-ldb-dev python-ntdb python-talloc-dev python-tdb > python-testtools python3 subunit xsltproc zlib1g-dev If you do repository/online/sources=yes, you can then do "apt-get build-dep samba" > * git clone git://git.samba.org/samba.git samba > * cd samba > * ./configure.developer > * TDB_NO_FSYNC=1 make -j test FAIL_IMMEDIATELY=1 SOCKET_WRAPPER_KEEP_PCAP=1 > TESTS="samba3.raw.composite" > > Samba test works with 4.1.0-ucs171-amd64 so far (started overnight test loop > ...) Good. > But there are some strange kernel messages Comparing with 3.16.0-ucs135-amd64... > ACPI Warning: SystemIO range 0x0000000000000828-0x000000000000082F conflicts > with OpRegion 0x0000000000000800-0x000000000000084F (\PMRG) > (20150410/utaddress-254) Known firmware bug - can be ignored > iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS disabled in BIOS, so message is expected - can be ignored > kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using > workaround Known hardware bug - can be ignored > and in particularly > > Request for unknown module key 'Build time autogenerated kernel key: > 006416f63733d99e57be1fd3a06d66c85b9e2c23' err -11 > > -> dmesg | grep "006416f63733d99e57be1fd3a06d66c85b9e2c23" | wc -l > 194 New bug in this kernel, but problem is known to as as part of Bug #38214: <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1494562> <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1494943> <https://launchpadlibrarian.net/217914743/wily+master-next-0001-x509-only-prefix-strip-raw-serial-numbers.patch> Basically its the problem, that we as Univention don't use a fixed KEY to sign our kernel, so the build process is forced to generate a random key each time a kernel is built. This time (1 out of 16) the built generated a key starting with '0', which exposes this bug. Package: linux Version: 4.1.6-1.174.201602110938 Branch: ucs_4.1-0 Scope: errata4.1-0 I tracked it down by bisecting the linux kernel to the commit of "unix: avoid use-after-free in ep_remove_wait_queue"; see <https://lkml.org/lkml/2016/2/2/474> As that patch was reverted for ucs174, that kernel can be used for now. r67358 | Bug #40558 linux: Update to ucs174 (4.1.16-unix) Package: univention-kernel-image-signed Version: 2.0.0-6.15.201602111457 Branch: ucs_4.1-0 Scope: errata4.1-0 r67359 | Bug #40558 kernel: Update to ucs174 (4.1.16-unix) Package: univention-kernel-image Version: 9.0.0-8.90.201602111504 Branch: ucs_4.1-0 Scope: errata4.1-0 r67360 | Bug #40558 kernel: Update to ucs174 (4.1.16-unix) YAML branches/ucs-4.1/ucs-4.1-0/doc/errata/staging/linux.yaml | 2 +- branches/ucs-4.1/ucs-4.1-0/doc/errata/staging/univention-kernel-image-signed.yaml | 2 +- branches/ucs-4.1/ucs-4.1-0/doc/errata/staging/univention-kernel-image.yaml | 2 +- kernel 174: * OK - on hardware * OK - amazon ec2 * OK - kvm on hardware: * OK - Samba test suite * OK - uvmm/kvm windows 10 installation * OK - uvmm/kvm ucs 4.1 installation * OK - uvmm/kvm ucs 4.1 installation with update to kernel 174 on all: * OK - TDB_NO_FSYNC=1 make -j \ test SOCKET_WRAPPER_KEEP_PCAP=1 TESTS="samba3.raw.composite" * OK - kernel make kselftest OK - dkms (sources, module build, modprobe) OK - linux.yaml OK - univention-kernel-image-signed.yaml OK - univention-kernel-image.yaml |