Bug 34545 - dreplsrv + rpc_server deadlock
dreplsrv + rpc_server deadlock
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Samba4
UCS 3.2
Other Linux
: P5 normal (vote)
: UCS 3.2-1-errata
Assigned To: Stefan Gohmann
Arvid Requate
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-15 07:06 CEST by Stefan Gohmann
Modified: 2014-05-07 15:24 CEST (History)
1 user (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gohmann univentionstaff 2014-04-15 07:06:31 CEST
In one of customer environment we see that the DRS replication does not work as expected. At some point it gets more worse and Samba does not answer any request, for example 'samba-tool drs showrepl' stuck.

On a problematic DC, I've checked the processes and saw this:

root@ucs3:~# samba-tool processes -d0
Service: PID
-----------------------------
dnsupdate 24496
rpc_server 24486
rpc_server 24486
rpc_server 24486
rpc_server 24486
rpc_server 24486
cldap_server 24490
winbind_server 24493
kdc_server 24491
samba 0
dreplsrv 24492
kccsrv 24495
ldap_server 24489
root@ucs3-dc3:~# strace -p 24486 -f
Process 24486 attached - interrupt to quit
fcntl64(11, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=8,
len=1}^C <unfinished ...="">
Process 24486 detached
root@ucs3-dc3:~# egrep "(24486|24492)" /proc/locks
3: POSIX ADVISORY READ 24492 fe:00:1087413 168 EOF
4: POSIX ADVISORY WRITE 24492 fe:00:1087413 8 8
5: POSIX ADVISORY READ 24492 fe:00:1087412 168 EOF
6: POSIX ADVISORY WRITE 24492 fe:00:1087412 8 8
6: -> POSIX ADVISORY READ 24486 fe:00:1087412 8 8
7: POSIX ADVISORY READ 24492 fe:00:1087411 168 EOF
8: POSIX ADVISORY WRITE 24492 fe:00:1087411 8 8
9: POSIX ADVISORY READ 24492 fe:00:1087410 168 EOF
10: POSIX ADVISORY WRITE 24492 fe:00:1087410 8 8
11: POSIX ADVISORY READ 24492 fe:00:1063288 168 EOF
12: POSIX ADVISORY WRITE 24492 fe:00:1063288 8 8
root@ucs3-dc3:~# ls -la /proc/24486/fd/11
lrwx------ 1 root root 64 Apr 12 05:25 /proc/24486/fd/11 ->
/var/lib/samba/private/sam.ldb.d/CN=CONFIGURATION,DC=XXX
root@ucs3:~#

As far as I understand it, the dreplsrv holds a lock to the LDB file and
rpc_server waits to get the lock.
Comment 1 Stefan Gohmann univentionstaff 2014-05-06 06:43:12 CEST
I've added two upstream patches.

Patches: r13022, r13070, r13071
YAML: r49814
Comment 2 Arvid Requate univentionstaff 2014-05-06 18:23:30 CEST
* Samba in errata3.2-1 is built with the following patches
  97_0001-dsdb-Do-checks-for-invalid-renames-in-samldb-before-.patch
  97_0001-dsdb-Rename-private_data-to-rootdse_private_data-in-.patch
  97_0002-dsdb-Do-not-permit-nested-event-loops-when-in-a-tran.patch

* They correspond to the upstream commits with the same Change-Id
* The patches applied cleanly without fuzz
* ucs-test -s samba4 works
* The s4-connector jenkins-tests look good as well.

* Advisory Ok
Comment 3 Moritz Muehlenhoff univentionstaff 2014-05-07 15:24:29 CEST
http://errata.univention.de/ucs/3.2/106.html