Bug 54715 - Joining a Backup Node into UCS@school "singleserver" setup runs in several timeouts and finally fails
Joining a Backup Node into UCS@school "singleserver" setup runs in several ti...
Status: NEW
Product: UCS@school
Classification: Unclassified
Component: General
UCS@school 5.0
Other Linux
: P5 normal (vote)
: ---
Assigned To: UCS@school maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2022-05-05 15:20 CEST by Arvid Requate
Modified: 2023-03-31 09:53 CEST (History)
6 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 4: Minor Usability: Impairs usability in secondary scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 5: Blocking further progress on the daily work
User Pain: 0.114
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2022071821000427, 2022082921000173
Bug group (optional):
Max CVSS v3 score:


Attachments
join.log (245.28 KB, text/x-log)
2022-05-05 15:29 CEST, Arvid Requate
Details
actualise.log (1007.30 KB, text/x-log)
2022-05-05 15:29 CEST, Arvid Requate
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate univentionstaff 2022-05-05 15:20:08 CEST
Running univention-join on an unjoined Backup Node to join into a UCS@school environment (in my case a single school) takes **very** long and leaves the administrator puzzeled as to what's going on during the pre-joinscripts phase:

```
root@ucsBackup:~# univention-join 
univention-join: joins a computer to an ucs domain
copyright (c) 2001-2022 Univention GmbH, Germany

Enter Primary Directory Node Account : Administrator
Enter Primary Directory Node Password: 

Search Primary Directory Node:                             done
Check Primary Directory Node:                              done
Stop LDAP Server:                                          done
Search ldap/base                                           done
Start LDAP Server:                                         done
Search LDAP binddn                                         done
Sync time:                                                 done
Running pre-join hook(s):                                  done
Join Computer Account:                                     done
Stopping univention-directory-notifier daemon:             done
Stopping univention-directory-listener daemon:             done
Sync ldap.secret:                                          done
Sync ldap-backup.secret:                                   done
Sync SSL directory:                                        done
Check TLS connection:                                      done
Download host certificate:                                 done
Sync SSL settings:                                         done
Purging translog database:                                 done
Restart LDAP Server:                                       done
Sync Kerberos settings:                                    done
Create kerberos/adminserver
File: /etc/krb5.conf
Running pre-joinscripts hook(s): 
```

Looking into the join.log shows that it runs
```
2022-05-05 14:34:11,071 ucsschool-join-hook: [INFO] Calling ('univention-install', '--force-yes', '--yes', 'ucs-school-singleserver') 
```

Lookin at the actualise.log shows, that

1. Joinscripts are run (e.g. univention-samba4) and run into local ldapsearch timeouts because the LDAP replication has not even be configured at this stage of the join.

2. Dynamic registrations of LDAP Schema and ACL extensions repeatedly take time to fail for the same reason.

We could adjust the `call_joinscript` library functions (which e.g. is used in univention-samba4.postinst) to not run if the machine is not joined yet. But what does that even mean at this point? For sure univention-check-join-status fails, because there is a joinscript that has not been run. But machine.secret is already there (and works) to fetch the join hooks and the file /var/univention-join/joined is also already there, albeit still empty at that stage. As a hack we could maybe remove /var/univention-join/joined in the prejoin-hook and adjust call_joinscript to abort if that file is not yet present.

Make up your mind and file a bug against UCS to implement the expected behavior for call_joinscript.
Comment 1 Arvid Requate univentionstaff 2022-05-05 15:28:06 CEST
Just look at the time gap here in join.log, that's 40 minutes:

```
2022-05-05 14:34:11,071 ucsschool-join-hook: [INFO] Calling ('univention-install', '--force-yes', '--yes', 'ucs-school-singleserver') ...
05.05.22 14:43:52.901  DEBUG_INIT
05.05.22 14:52:29.735  DEBUG_INIT
2022-05-05 15:15:13,893 ucsschool-join-hook: [INFO] Not installing 'UCS@school Veyon Proxy' app on this system role.
2022-05-05 15:15:13,897 ucsschool-join-hook: [INFO] ucsschool-join-hook.py is done
Configure 00ucs-school-app-version-check.inst Thu May  5 15:15:13 CEST 2022
2022-05-05 15:15:13.991516791+02:00 (in joinscript_init)
Version of app "ucsschool" on this host: "5.0 v1"
Version of app "ucsschool" on Primary Directory Node: "5.0 v1"
OK: local version of app "ucsschool" lower than or equal to version on Primary Directory Node.
Version check passed.
2022-05-05 15:15:15.039867519+02:00 (in joinscript_save_current_version)
Configure 01univention-ldap-server-init.inst Thu May  5 15:15:15 CEST 2022
2022-05-05 15:15:15.071292350+02:00 (in joinscript_init)
File: /var/lib/univention-ldap/translog/DB_CONFIG
6273cde3 /etc/ldap/slapd.conf: line 199: unknown attr "@univentionApp" in to clause
6273cde3 <access clause> ::= access to <what> [ by <who> [ <access> ] [ <control> ] ]+ 
<what> ::= * | dn[.<dnstyle>=<DN>] [filter=<filter>] [attrs=<attrspec>]
<attrspec> ::= <attrname> [val[/<matchingRule>][.<attrstyle>]=<value>] | <attrlist>
<attrlist> ::= <attr> [ , <attrlist> ]
<attr> ::= <attrname> | @<objectClass> | !<objectClass> | entry | children
<who> ::= [ * | anonymous | users | self | dn[.<dnstyle>]=<DN> ]
        [ realanonymous | realusers | realself | realdn[.<dnstyle>]=<DN> ]
        [dnattr=<attrname>]
        [realdnattr=<attrname>]
        [group[/<objectclass>[/<attrname>]][.<style>]=<group>]
        [peername[.<peernamestyle>]=<peer>] [sockname[.<style>]=<name>]
        [domain[.<domainstyle>]=<domain>] [sockurl[.<style>]=<url>]
        [dynacl/<name>[/<options>][.<dynstyle>][=<pattern>]]
        [ssf=<n>] [transport_ssf=<n>] [tls_ssf=<n>] [sasl_ssf=<n>]
```

And then the error messages continue:

* 01univention-ldap-server-init.inst apparently leave slapd in a defunct state (yet it reports success for the joinscript)

* Next 03univention-directory-listener.inst fails likewise spectacular ways, because, no local slapd

So the current state is:
```
Running pre-joinscripts hook(s):                           done
Configure 00ucs-school-app-version-check.inst              done
Configure 01univention-ldap-server-init.inst               done
Configure 02univention-directory-notifier.inst             done
Configure 03univention-directory-listener.inst  ## hangs
```
Comment 2 Arvid Requate univentionstaff 2022-05-05 15:29:22 CEST
Created attachment 10944 [details]
join.log
Comment 3 Arvid Requate univentionstaff 2022-05-05 15:29:41 CEST
Created attachment 10945 [details]
actualise.log
Comment 4 Arvid Requate univentionstaff 2022-05-05 15:56:32 CEST
The end of the story:
```
Running pre-joinscripts hook(s):                           done
Configure 00ucs-school-app-version-check.inst              done
Configure 01univention-ldap-server-init.inst               done
Configure 02univention-directory-notifier.inst             done
Configure 03univention-directory-listener.inst             done


**************************************************************************
* Join failed!                                                           *
* Contact your system administrator                                      *
**************************************************************************
* Message:  Please visit https://help.univention.com/t/8842 for common problems during the join and how to fix them -- FAILED: failed.ldif exists.
**************************************************************************
```
Comment 5 Arvid Requate univentionstaff 2022-05-05 17:14:17 CEST
I tried running univention-join again and after some initial haggling with systemd to finally get slapd started normally, the join works "much better", ad the pre-joinscript stuff has already been done. Yet, it finally fails again with

```
Configure 62ucs-school-singleserver.inst                   failed


**************************************************************************
* Join failed!                                                           *
* Contact your system administrator                                      *
**************************************************************************
* Message:  Please visit https://help.univention.com/t/8842 for common problems during the join and how to fix them -- FAILED: 62ucs-school-singleserver.inst
**************************************************************************
```

and join.log shows
```
Object modified: cn=ucsBackup,cn=dc,cn=computers,dc=jtorres,dc=org
The object type of this object differs from the specified object type: The object cn=ucsBackup,cn=dc,cn=computers,dc=jtorres,dc=org is not a computers/domaincontroller_master.
62ucs-school-singleserver.inst: 


**************************************************************************
* Join failed!                                                           *
* Contact your system administrator                                      *
**************************************************************************
* Message:  Please visit https://help.univention.com/t/8842 for common problems during the join and how to fix them -- FAILED: 62ucs-school-singleserver.inst
**************************************************************************
```

This is true, the object type is computers/domaincontroller_backup, but that's the whole point of this exercise.
Comment 6 Christina Scheinig univentionstaff 2022-07-20 10:55:14 CEST
The customer has a singlemaster and a backupserver. Since UCS 5 the backupserver is not able to join.
Comment 7 Erik Damrose univentionstaff 2022-07-20 15:45:46 CEST
A similar bug was fixed in, 62ucs-school-multiserver.inst:

if [[ "$server_role" = domaincontroller_master ]]; then
    ucsschoolRole=dc_master
else
    ucsschoolRole=dc_backup
fi
univention-directory-manager "computers/$server_role" modify "$@" \
...

BUT in a singleserver environment, we should make sure what should be executed on a DC backup by the 62ucs-school-singleserver.inst script
Comment 8 Arvid Requate univentionstaff 2022-07-20 17:12:08 CEST
Yes, we basically adjusted the 62ucs-school-singleserver.inst script to exit early
on the joining DC Backup, before it starts to do things specific to the domaincontroller_master.

But basically it just shows that there is an unresolved clash of concepts between
the "ucs-school-singleserver" and the concept of joining a Backup Directory Node.
Comment 9 Ingo Steuwer univentionstaff 2022-07-21 14:06:39 CEST
I changed the summary to make more explicit that this happens only in singleserver environments.

Seems like a workaround was possible in the support ticket. Can we document the steps here in case they are helpfull to fix the problem?
Comment 11 Christina Scheinig univentionstaff 2022-07-21 14:08:40 CEST
(In reply to Ingo Steuwer from comment #9)
> I changed the summary to make more explicit that this happens only in
> singleserver environments.
> 
> Seems like a workaround was possible in the support ticket. Can we document
> the steps here in case they are helpfull to fix the problem?

The fix was directly editing the joinscript. We edit this lines, to prevent "thinks" happening on the backup.

----------
if [ "$server_role" != domaincontroller_master ]; then
        joinscript_save_current_version
        exit 0
fi
----------------