Bug 52802 - Linux kernel 4.19.0-14-amd64 crash UCS 5 - maybe Qemu-q35 related
Linux kernel 4.19.0-14-amd64 crash UCS 5 - maybe Qemu-q35 related
Status: RESOLVED WORKSFORME
Product: UCS
Classification: Unclassified
Component: Kernel
UCS 5.0
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2021-02-16 15:36 CET by Philipp Hahn
Modified: 2022-02-18 08:42 CET (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.200
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2021-02-16 15:36:09 CET
During UCS-5.0 development my VM crashed several times. The latest OOPS:

[Feb16 06:50] general protection fault: 0000 [#1] SMP PTI
[  +0,000065] CPU: 1 PID: 17450 Comm: nfsmounts Not tainted 4.19.0-14-amd64 #1 Debian 4.19.171-2
[  +0,000052] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
[  +0,000067] RIP: 0010:kernfs_sop_show_options+0x2d/0x40
[  +0,000035] Code: 00 00 48 8b 46 30 48 85 c0 74 07 48 8b 80 50 02 00 00 48 8b 50 08 48 85 d2 48 0f 44 d0 48 8b 72 50 48 8b 46 30 48 85 c0 74 0e <48> 8b 40 08 48 85 c0 74 05 e9 05 57 72 00 31 c0 c3 66 90 0f 1f 44
[  +0,000106] RSP: 0018:ffff9e5f01fffe20 EFLAGS: 00010202
[  +0,000032] RAX: 0105028102951075 RBX: ffff8b0c7af66680 RCX: 00000000656f6e2c
[  +0,000041] RDX: ffff8b0c351de440 RSI: ffff8b0c34a74840 RDI: ffff8b0c7af66680
[  +0,000040] RBP: ffff8b0c34286c20 R08: 6d6974616c65722c R09: 656d6974616c6572
[  +0,000041] R10: ffff9e5f01fffddc R11: ffff8b0c75e123bc R12: ffff8b0c7cfec000
[  +0,000040] R13: ffff8b0c77e50e80 R14: ffff8b0c7abb4900 R15: ffff8b0c7af66680
[  +0,000042] FS:  00007ff4bb68e740(0000) GS:ffff8b0c7db00000(0000) knlGS:0000000000000000
[  +0,000045] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0,000033] CR2: 00007ff4bb634e10 CR3: 000000007ac0c000 CR4: 00000000000006e0
[  +0,000069] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0,000041] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  +0,000041] Call Trace:
[  +0,000053]  show_vfsmnt+0x124/0x170
[  +0,000038]  seq_read+0x2e9/0x410
[  +0,000026]  vfs_read+0x91/0x140
[  +0,000023]  ksys_read+0x57/0xd0
[  +0,000058]  do_syscall_64+0x53/0x110
[  +0,000051]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0,000053] RIP: 0033:0x7ff4bb77d461
[  +0,000058] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
[  +0,000135] RSP: 002b:00007ffc1c076e68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  +0,000067] RAX: ffffffffffffffda RBX: 000055f6fceab250 RCX: 00007ff4bb77d461
[  +0,000070] RDX: 0000000000002000 RSI: 000055f6fd162e10 RDI: 0000000000000004
[  +0,000096] RBP: 00007ff4bb84b2a0 R08: 0000000000000b40 R09: 000055f6fceab250
[  +0,000089] R10: 000055f6fcda0010 R11: 0000000000000246 R12: 0000000000002000
[  +0,000093] R13: 000055f6fd162e10 R14: 0000000000000d68 R15: 00007ff4bb84a760
[  +0,000100] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_state xt_conntrack nft_compat nft_counter nft_chain_route_ipv6 nft_chain_rou
[  +0,009114]  raid1 raid0 multipath linear md_mod dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod hid_generic usbhid hid sr_mod cdrom sd_mod virtio_gpu ttm drm_kms_helper ahci libahci drm virtio_net net_failover failover virtio_scsi xhci_pci libata xhci_hcd 
[  +0,009397] ---[ end trace b63534c6f4770b73 ]---
[  +0,001854] RIP: 0010:kernfs_sop_show_options+0x2d/0x40
[  +0,001273] Code: 00 00 48 8b 46 30 48 85 c0 74 07 48 8b 80 50 02 00 00 48 8b 50 08 48 85 d2 48 0f 44 d0 48 8b 72 50 48 8b 46 30 48 85 c0 74 0e <48> 8b 40 08 48 85 c0 74 05 e9 05 57 72 00 31 c0 c3 66 90 0f 1f 44
[  +0,002644] RSP: 0018:ffff9e5f01fffe20 EFLAGS: 00010202
[  +0,001314] RAX: 0105028102951075 RBX: ffff8b0c7af66680 RCX: 00000000656f6e2c
[  +0,001330] RDX: ffff8b0c351de440 RSI: ffff8b0c34a74840 RDI: ffff8b0c7af66680
[  +0,001337] RBP: ffff8b0c34286c20 R08: 6d6974616c65722c R09: 656d6974616c6572
[  +0,001300] R10: ffff9e5f01fffddc R11: ffff8b0c75e123bc R12: ffff8b0c7cfec000
[  +0,001317] R13: ffff8b0c77e50e80 R14: ffff8b0c7abb4900 R15: ffff8b0c7af66680
[  +0,001256] FS:  00007ff4bb68e740(0000) GS:ffff8b0c7db00000(0000) knlGS:0000000000000000
[  +0,001280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0,001240] CR2: 00007ff4bb634e10 CR3: 000000007ac0c000 CR4: 00000000000006e0
[  +0,001293] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0,001283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Comment 1 Philipp Hahn univentionstaff 2021-06-15 11:50:24 CEST
crashed lagan
Comment 3 Philipp Hahn univentionstaff 2021-07-13 12:48:21 CEST
The bug seems to be triggered by `/usr/lib/univention-directory-policy/nfsmounts` reading `/proc/self/mounts` AKA `/etc/mtab`.
`syscall_ops` seems to be invalid, which is only ever initialized by fs/kernfs/dir.c:kernfs_create_root()
This seems to be used for every cgroup by kernel/cgroup/cgroup.c:cgroup_setup_root()

Candidates:
- 630faf81b3e6 cgroup: don't put ERR_PTR() into fc->root
- 35ac1184244f cgroup: saner refcounting for cgroup_root
- 399504e21a10 fix cgroup_do_mount() handling of failure exits
- 2fd60da46da7 kernfs: fix potential null pointer dereference

# scripts/decodecode
>   0:   INCOMPLETE
>   2:   48 8b 46 30             mov    0x30(%rsi),%rax
>   6:   48 85 c0                test   %rax,%rax
>   9:   74 07                   je     0x12
>   b:   48 8b 80 50 02 00 00    mov    0x250(%rax),%rax
>  12:   48 8b 50 08             mov    0x8(%rax),%rdx
>  16:   48 85 d2                test   %rdx,%rdx
>  19:   48 0f 44 d0             cmove  %rax,%rdx
>  1d:   48 8b 72 50             mov    0x50(%rdx),%rsi
>  21:   48 8b 46 30             mov    0x30(%rsi),%rax
>  25:   48 85 c0                test   %rax,%rax
>  28:   74 0e                   je     0x38
>  2a:*  48 8b 40 08             mov    0x8(%rax),%rax           <-- trapping instruction
>  2e:   48 85 c0                test   %rax,%rax
>  31:   74 05                   je     0x38
>  33:   e9 05 57 72 00          jmpq   0x72573d
>  38:   31 c0                   xor    %eax,%eax
>  3a:   c3                      retq   
>  3b:   STRIPPED

# objdump -S fs/kernfs/mount.o --start-address=0x30 --stop-address=0x60
> 0000000000000030 <kernfs_sop_show_options>:
> static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry)
> {
>   00:   e8 00 00 00 00          callq  35 <kernfs_sop_show_options+0x5>
> };
> #define kernfs_info(SB) ((struct kernfs_super_info *)(SB->s_fs_info))
> static inline struct kernfs_node *kernfs_dentry_node(struct dentry *dentry)
> {
>         if (d_really_is_negative(dentry))
>   05:   48 8b 46 30             mov    0x30(%rsi),%rax
>   09:   48 85 c0                test   %rax,%rax
>   0c:   74 07                   je     45 <kernfs_sop_show_options+0x15>
>                 return NULL;
>         return d_inode(dentry)->i_private;
>   0e:   48 8b 80 50 02 00 00    mov    0x250(%rax),%rax
>         if (kn->parent)
>   15:   48 8b 50 08             mov    0x8(%rax),%rdx
=== RDX: ffff8b0c351de440 -> kn->parent
>   19:   48 85 d2                test   %rdx,%rdx
>   1c:   48 0f 44 d0             cmove  %rax,%rdx
>         return kn->dir.root;
>   20:   48 8b 72 50             mov    0x50(%rdx),%rsi
=== RSI: ffff8b0c34a74840 -> root
>         struct kernfs_root *root = kernfs_root(kernfs_dentry_node(dentry));
>         struct kernfs_syscall_ops *scops = root->syscall_ops;
>   24:   48 8b 46 30             mov    0x30(%rsi),%rax
=== RAX: ffffffffffffffda -> root->syscall_ops ?= -ENOSYS ???
>         if (scops && scops->show_options)
>   28:   48 85 c0                test   %rax,%rax
>   2b:   74 0e                   je     6b <kernfs_sop_show_options+0x3b>
>   2d:   48 8b 40 08             mov    0x8(%rax),%rax
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   31:   48 85 c0                test   %rax,%rax
>   34:   74 05                   je     6b <kernfs_sop_show_options+0x3b>
>                 return scops->show_options(sf, root);
>   36:   e9 00 00 00 00          jmpq   6b <kernfs_sop_show_options+0x3b>
>         return 0;
> }
>   3b:   31 c0                   xor    %eax,%eax
>   3d:   c3                      retq   


2021-07-13: Asked for help on LKML

(In reply to Philipp Hahn from comment #1)
> crashed lagan

This might be completely unrelated as I don't have the Kernel trace dump.
Comment 5 Ingo Steuwer univentionstaff 2022-02-17 17:51:39 CET
Current kernek version is 4.19.0-18

Did this ever happen again?
Comment 6 Philipp Hahn univentionstaff 2022-02-18 08:42:14 CET
(In reply to Ingo Steuwer from comment #5)
> Current kernek version is 4.19.0-18
> 
> Did this ever happen again?

I have never seen this again, but also did not actively for it.

It would help to collecting crash information automatically as described in Bug #37314