Bug 33050 - Various samba tests fail on S3 slave
Various samba tests fail on S3 slave
Status: CLOSED FIXED
Product: UCS Test
Classification: Unclassified
Component: Samba
unspecified
Other Linux
: P5 normal (vote)
: UCS 3.2-0-errata
Assigned To: Stefan Gohmann
Arvid Requate
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-01 07:30 CET by Stefan Gohmann
Modified: 2014-03-17 14:51 CET (History)
1 user (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
listener.log.debug4.bug_33050.bz2 (1.24 MB, application/x-bzip2)
2014-01-03 22:58 CET, Stefan Gohmann
Details
strace-listener_bug_33050.bz2 (129.72 KB, application/x-bzip2)
2014-01-03 22:58 CET, Stefan Gohmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gohmann univentionstaff 2013-11-01 07:30:02 CET
http://jenkins.knut.univention.de:8080/view/Autotest/job/UCS%203.2%20Autotest%20MultiEnv/186/SambaVersion=s3,Systemrolle=slave/testReport/

I seems to be a replication problem, for example:

-----------------------------------------------------------------------------
Fehlschlag

53_samba-common/46share_access_permissions_sambaValidUsers.Access a share as a user whitelisted by sambaValidUsers (from samba-common.Access a share as a user whitelisted by sambaValidUsers)
Schlägt fehl seit 5 Builds (Seit Instabil#182 )
Dauer: 1 Minute 19 Sekunden.
Beschreibung hinzufügen
Fehlermeldung

Test failed

Standard Ausgabe (STDOUT)

## create user
Object created: uid=vunvelve,cn=users,dc=autotest094,dc=local
Object created: cn=s0v8ECF7,cn=slave094.autotest094.local,cn=shares,dc=autotest094,dc=local
Waiting for replication:
OK: replication complete (nid=2925 lid=2925)
Done: replication complete.
Waiting for postrun
## wait for samba share export
Object removed: uid=vunvelve,cn=users,dc=autotest094,dc=local
Object removed: cn=s0v8ECF7,cn=slave094.autotest094.local,cn=shares,dc=autotest094,dc=local
Waiting for replication:
OK: replication complete (nid=2928 lid=2928)
Done: replication complete.
Waiting for postrun

Standard Fehler (STDERR)

info 2013-10-31 20:23:20	 create user vunvelve
error 2013-10-31 20:24:20	 TIMEOUT: Access to share still failed after 30 seconds: 
error 2013-10-31 20:24:20	 **************** Test failed above this line (1) ****************
info 2013-10-31 20:24:20	 remove user vunvelve
debug 2013-10-31 20:24:20	 user vunvelve removed
info 2013-10-31 20:24:20	 checking whether the user vunvelve is really removed
debug 2013-10-31 20:24:21	 user vunvelve does not exist
-----------------------------------------------------------------------------
Comment 1 Stefan Gohmann univentionstaff 2014-01-03 16:25:42 CET
This is a little bit strange. During the failedldif test (tests/10_ldap/60failedldif) the samba / winbind processes lost the connection to the LDAP server. In this time the smb.conf is not written complete:

-----------------------------------------------------------------------
# Warning: This file is auto-generated and might be overwritten by
#          univention-config-registry.
#          Please edit the following file(s) instead:
# Warnung: Diese Datei wurde automatisch generiert und kann durch
#          univention-config-registry überschrieben werden.
#          Bitte bearbeiten Sie an Stelle dessen die folgende(n) Datei(en):
#
#       /etc/univention/templates/files/etc/samba/smb.conf.d/01univention-samba_main
#       /etc/univention/templates/files/etc/samba/smb.conf.d/02univention-samba_netbios
#       /etc/univention/templates/files/etc/samba/smb.conf.d/11univention-samba_ldap
#       /etc/univention/templates/files/etc/samba/smb.conf.d/21univention-samba_winbind
#       /etc/univention/templates/files/etc/samba/smb.conf.d/31univention-samba_password
#       /etc/univention/templates/files/etc/samba/smb.conf.d/41univention-samba_printing
#       /etc/univention/templates/files/etc/samba/smb.conf.d/51univention-samba_domain
#       /etc/univention/templates/files/etc/samba/smb.conf.d/52univention-samba_domainname
#       /etc/univention/templates/files/etc/samba/smb.conf.d/61univention-samba_misc
#       /etc/univention/templates/files/etc/samba/smb.conf.d/71univention-samba_users
#       /etc/univention/templates/files/etc/samba/smb.conf.d/81univention-quota_scripts
#       /etc/univention/templates/files/etc/samba/smb.conf.d/81univention-samba_scripts
#       /etc/univention/templates/files/etc/samba/smb.conf.d/90univention-samba_user_shares
#       /etc/univention/templates/files/etc/samba/smb.conf.d/91univention-samba_shares
#       /etc/univention/templates/files/etc/samba/smb.conf.d/92univention-samba_shares
#       /etc/univention/templates/files/etc/samba/smb.conf.d/95univention-samba_local_config
#       /etc/univention/templates/files/etc/samba/smb.conf.d/99univention-samba_local_shares
#

[global]
        debug level     = 0
        syslog          = 0
        max log size    = 0

        max open files = 32808
        server string = %h univention corporate server
        machine password timeout        = 0

        netbios name = slave094



        ; ldap

        passdb backend = ldapsam:"ldap://slave094.autotest094.local:7389"
        auth methods = guest sam winbind
        ldap suffix = dc=autotest094,dc=local
        ldap admin dn = "cn=slave094,cn=dc,cn=computers,dc=autotest094,dc=local"
        ldap ssl = start tls
        passdb expand explicit = no



        ; idmap/winbind
        ldap idmap suffix = cn=idmap,cn=univention
        idmap config * : backend        = ldap
        idmap config * : range          = 55000-64000
        idmap config * : ldap_url       = ldap://slave094.autotest094.local:7389
        idmap config * : ldap_user_dn   = cn=slave094,cn=dc,cn=computers,dc=autotest094,dc=local
        idmap config AUTOTEST094 : backend = nss
        idmap config AUTOTEST094 : range = 1000-54999

        winbind max clients = 500
        winbind nested groups = no

        winbind enum users = yes
        winbind enum groups = yes
        winbind separator = +
        ; winbind use default domain = yes
        ; winbind enable local accounts = yes
        template shell = /bin/bash
        template homedir = /home/%D-%U

        ; password sync
        pam password change = no
        unix password sync = no



        ; ldap passwd sync = yes
        passwd chat = *New*password* %n\n *Re-enter*new*password* %n\n *password*changed*
        passwd chat timeout = 60

        client use spnego = yes


        obey pam restrictions = yes


        encrypt passwords = yes


        ; printing
        load printers = yes
        printing = cups
        printcap name = cups


        ; domain
        security = user
        domain logons = yes
        domain master = no
        preferred master = yes
        local master = yes


        os level = 65
        wins support = no
        wins server = master094.autotest094.local


        workgroup = AUTOTEST094
        oplocks = yes
        kernel oplocks = yes
        large readwrite = yes
        deadtime = 15
        read raw = yes
        write raw = yes
        max xmit = 65535
        getwd cache = yes
        wide links = no
        store dos attributes = yes
        max protocol = SMB2
        logon home = \\slave094\%U
        logon drive = I:
        logon path = \\slave094\%U\windows-profiles\%a
        preserve case = yes
        short preserve case = yes
        time server = yes
        host msdfs = no
        msdfs root = no

        guest account = nobody
        map to guest = Bad User

        admin users = administrator join-backup
        set quota command = /usr/sbin/univention-setquota


 check password script = /usr/share/univention-samba/password
-----------------------------------------------------------------------

Here it ends. At the same time I see:
Jan  3 09:52:41 slave094 kernel: [566624.196769] univention-dire[8417]: segfault at 0 ip           (null) sp 00007fff1b8473d8 error 14 in univention-directory-listener[400000+14000]

From the listener.log:
03.01.14 09:52:39.812  LISTENER    ( ERROR   ) : Can't contact LDAP server: going into LDIF mode
kadmin: ext host/slave094.autotest094.local@AUTOTEST094.LOCAL: Wrong database version
03.01.14 09:52:39.901  LISTENER    ( ERROR   ) : Could not write to transaction file /var/lib/univention-ldap/listener/listener. Check for /var/lib/univention-directory-replication/fa
iled.ldif

03.01.14 09:52:39.901  LISTENER    ( ERROR   ) : failed to write to transaction file
Abort: Can't contact LDAP server.
Try to sync changes stored in /var/lib/univention-directory-replication/failed.ldif into local LDAP
03.01.14 09:52:41.495  LISTENER    ( WARN    ) : received signal 15
waiting for listener-shutdown close failed in file object destructor:
Error in sys.excepthook:

Original exception was:
 . . . . . shutdown done
replay stored changes ...

Restored modifies sucessfuly, the ldif-file will be moved to /tmp/replayed.ldif_2014-01-03-09:52:46
Starting univention-directory-listener daemon.
done.
03.01.14 09:52:47.222  DEBUG_INIT
Comment 2 Stefan Gohmann univentionstaff 2014-01-03 22:58:16 CET
Created attachment 5709 [details]
listener.log.debug4.bug_33050.bz2
Comment 3 Stefan Gohmann univentionstaff 2014-01-03 22:58:43 CET
Created attachment 5710 [details]
strace-listener_bug_33050.bz2
Comment 4 Stefan Gohmann univentionstaff 2014-01-06 08:36:15 CET
I've added a simple workaround to ucs-test/tests/10_ldap/60failedldif. At the end of this test case /etc/samba/smb.conf is re-written and the samba processes are restarted.

The real problem is that UCR does not write the files as atomic operation. I've opened a new bug for it: Bug #33842.
Comment 5 Stefan Gohmann univentionstaff 2014-01-08 07:27:15 CET
Now it happened on a S4 DC Backup. It seems that the DRS replication does not work:

http://jenkins.knut.univention.de:8080/view/Autotest/job/UCS%203.2%20Autotest%20MultiEnv/SambaVersion=s4,Systemrolle=backup/227/testReport/
Comment 6 Stefan Gohmann univentionstaff 2014-01-13 07:15:56 CET
I've added some more debug. During the 61getent_crash test case one samba process failed with a segmentation fault:

[2014/01/11 06:50:25.743787,  0, pid=1305] ../lib/util/fault.c:73(fault_report)
  INTERNAL ERROR: Signal 11 in pid 1305 (4.1.0-Debian)
  Please read the Trouble-Shooting section of the Samba HOWTO

All other samba processes still run but the DRS replication fails:

[2014/01/11 06:50:30.584953,  0, pid=1296] ../source4/rpc_server/common/forward.c:55(dcesrv_irpc_forward_callback)
  IRPC callback failed for DsReplicaSync - NT_STATUS_CONNECTION_REFUSED
[2014/01/11 06:50:35.589097,  0, pid=1296] ../source4/rpc_server/common/forward.c:55(dcesrv_irpc_forward_callback)
  IRPC callback failed for DsReplicaSync - NT_STATUS_CONNECTION_REFUSED


root@backup093:~# samba-tool processes
 Service:                PID 
-----------------------------
dnsupdate               1310
rpc_server              1296
rpc_server              1296
rpc_server              1296
rpc_server              1296
rpc_server              1296
cldap_server            1303
winbind_server          1307
kdc_server              1304
samba                      0
dreplsrv                1305
kccsrv                  1309
ldap_server             1301
root@backup093:~# 

I've created Bug #33904 for this issue and disabled the test case 10_ldap/61getent_crash.
Comment 7 Arvid Requate univentionstaff 2014-01-15 15:43:38 CET
Indeed, I just tried to reproduce the problem and discovered the test was disabled :-)
Comment 8 Moritz Muehlenhoff univentionstaff 2014-03-17 14:51:32 CET
Released as an errata update for unmaintained.