Bug 39775 - libnss-extrausers not thread safe? - prevents libvirt from starting
libnss-extrausers not thread safe? - prevents libvirt from starting
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.0
Other Linux
: P5 normal (vote)
: UCS 4.1-2-errata
Assigned To: Philipp Hahn
Stefan Gohmann
https://bugs.debian.org/cgi-bin/bugre...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-04 12:48 CET by Philipp Hahn
Modified: 2016-08-03 15:56 CEST (History)
4 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
Test program (1.81 KB, text/plain)
2016-07-19 18:29 CEST, Philipp Hahn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2015-11-04 12:48:44 CET
libnss-extrausers 0.6-3.12.201409252135

# cat libvirtd.gdb 
file /usr/sbin/libvirtd
set args -f /root/libvirtd.conf
set environment MALLOC_CHECK_ 2
run
# gdb -x libvirtd.gdb
...
Program received signal SIGABRT, Aborted.
Program received signal SIGSEGV, Segmentation fault.
$ bt
#0  0x00007ffff40c6165 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff40c93e0 in *__GI_abort () at abort.c:92
#2  0x00007ffff4109c10 in malloc_printerr (action=2, str=0x7ffff41e0b92 "free(): invalid pointer", ptr=0x6547)
    at malloc.c:6317
#3  0x00007ffff40fab3d in _IO_new_fclose (fp=0x5555559e55b0) at iofclose.c:88
#4  0x00007fffeb38d7d5 in _nss_extrausers_endgrent () from /usr/lib/libnss_extrausers.so.2
#5  0x00007fffeb38e67b in _nss_extrausers_initgroups_dyn () from /usr/lib/libnss_extrausers.so.2
#6  0x00007ffff413c362 in internal_getgrouplist (user=<optimized out>, group=110, size=0x7fffe8383868,
    groupsp=0x7fffe8383860, limit=<optimized out>) at initgroups.c:101
#7  0x00007ffff413c644 in getgrouplist (user=user@entry=0x5555558e96b0 "libvirt-qemu", group=group@entry=110,
    groups=groups@entry=0x5555559d69b0, ngroups=ngroups@entry=0x7fffe83838bc) at initgroups.c:153
#8  0x00007ffff76b732d in mgetgroups (username=0x5555558e96b0 "libvirt-qemu", gid=110, groups=0x7fffe83839b0)
    at ../../../../gnulib/lib/mgetgroups.c:90
#9  0x00007ffff75553e4 in virGetGroupList (uid=uid@entry=109, gid=25970, gid@entry=1009,
    list=list@entry=0x7fffe83839b0) at ../../../src/util/virutil.c:1057
#10 0x00007ffff7511759 in virFileAccessibleAs (
...

The exact signal differs: most of my test times SIGABRT, sometimes SIGSEGV, sometimes others.

Removing "extrausers" for "group" in /etc/nsswitch.conf and adding "Tech" to "/etc/groups" made the problem go away.

libnss-extrausers seems not be be thread-safe!


Upsteam: <http://anonscm.debian.org/cgit/users/brlink/libnss-extrausers.git/>
Backport of patch: <https://forge.univention.org/bugzilla/show_bug.cgi?id=29915>
NSS debugging: <https://ldpreload.com/blog/testing-glibc-nsswitch>
GDB: <https://sourceware.org/gdb/onlinedocs/gdb/Thread-Stops.html>
glibc: <http://www.gnu.org/software/libc/manual/html_node/Name-Service-Switch.html#Name-Service-Switch>
Comment 1 Philipp Hahn univentionstaff 2015-11-04 12:53:59 CET
Ticket #2015110421000306
Comment 2 Philipp Hahn univentionstaff 2016-06-14 15:12:42 CEST
Happened again.
Temporary fix for libvirt is to disable NSS module extrausers:

/etc/init.d/libvirtd stop
pkill -9 libvirtd
sed -e '/^group/s/extrausers//' -i /etc/nsswitch.conf
/etc/init.d/libvirtd start
ucr commit /etc/nsswitch.conf
Comment 3 Philipp Hahn univentionstaff 2016-06-14 15:13:48 CEST
Ticket #2016061421000386
Comment 4 Nico Stöckigt univentionstaff 2016-07-19 14:26:50 CEST
Ticket#2016071921000231
Comment 5 Jens Thorp-Hansen univentionstaff 2016-07-19 14:27:06 CEST
happens also at univention productive environment
Comment 6 Philipp Hahn univentionstaff 2016-07-19 18:29:51 CEST
Created attachment 7816 [details]
Test program

"Good" news: I can trigger the bug with a test program other than libvirtd.
It crashes every time with 100 threads in parallel doing getgrouplist().

getgrouplist() internally uses the "initgroups_dyn" implementation, which is implemented by "_nss_extrausers_initgroups_dyn()". This function calls "_nss_extrausers_setgrent()", which is not thread-save, because if overwrites the "static FILE *groupsfile".
Comment 7 Philipp Hahn univentionstaff 2016-07-20 18:53:34 CEST
# repo_admin.py --cherrypick ...
Cherry picked libnss-extrausers[58795] from 4.0-0-0[75]/None[0] to 4.1[76]/errata4.1-2[446]

r16615 | Bug #39775 nss-extrausers: Fix threading

Package: libnss-extrausers
Version: 0.6-3.13.201607201843
Branch: ucs_4.1-0
Scope: errata4.1-2

r71120 | Bug #39775 nss-extrausers: Fix threading YAML
 libnss-extrausers.yaml

QA: The test-program needs to be C-compiled, which currently prevents it from being added to ucs-test.
Comment 8 Stefan Gohmann univentionstaff 2016-08-02 16:13:55 CEST
YAML: OK

Code review: OK

Tests: OK 

I was able to reproduce the issue with the test program and the old package. With the new package it works. The output of 'getent group' in my environment is identical between the old and the new version.
Comment 9 Janek Walkenhorst univentionstaff 2016-08-03 15:56:32 CEST
<http://errata.software-univention.de/ucs/4.1/221.html>