Bug 50616 - /etc/init.d/slapd fails to start slapd if another slapd is running in a Docker Container
/etc/init.d/slapd fails to start slapd if another slapd is running in a Docke...
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: LDAP
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 4.4-3-errata
Assigned To: Dirk Wiesenthal
Arvid Requate
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-12-10 11:09 CET by Dirk Wiesenthal
Modified: 2020-03-18 12:27 CET (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 1: Cosmetic issue or missing function but workaround exists
Who will be affected by this bug?: 4: Will affect most installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.069
Enterprise Customer affected?:
School Customer affected?:
ISV affected?: Yes
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Wiesenthal univentionstaff 2019-12-10 11:09:22 CET
If an App uses its own OpenLDAP server in a separate container, our outer slapd will not start:

We patched /etc/init.d/slapd and use "pgrep /usr/sbin/slapd" which happens to find the docker descendent process and stops to start slapd as it assumes it is already running.

The original Debian Stretch script does not use pgrep, but pidfile. Maybe we can update our UCR template?

Or we can tell pgrep to not find those processes below /usr/bin/containerd.
Comment 3 Dirk Wiesenthal univentionstaff 2019-12-10 11:20:28 CET
pgrep -f /usr/sbin/slapd -P 1

_seems_ to work.
Comment 4 Dirk Wiesenthal univentionstaff 2019-12-12 10:09:21 CET
Fixed in
  univention-ldap 15.0.0-33A~4.4.0.201912121001

/etc/init.d/slapd now uses --ppid 1 everywhere. I also tried to update the whole script or at least transition to "pidfile" instead of "pgrep", but had problems with start-stop-daemon which is actually starting /usr/sbin/slapd.

This change is not very invasive and worked in my tests.

Even if not started via "service slapd start", but with "/usr/sbin/slapd" directly on a TTY, the process somehow manages to get PPID 1. Still hold for things like "env X=Y /usr/sbin/slapd", so it should be okay.

Note that doing this confuses systemd and should not be done, but this is not due to this bug, but a general rule.
Comment 5 Dirk Wiesenthal univentionstaff 2019-12-15 12:27:19 CET
I also adjusted ucs-test (9.0.3-128). Two tests were checking for /usr/sbin/slapd without taking PPID into consideration.

On a side note, 10_ldap/26reconnect_univention_ldapsearch stopped, but did not start our OpenLDAP if a container had its own slapd process. So next to all tests after that one failed. On the other hand, it is marked as "dangerous"...

(I mentioned this bug in another commit regarding appliance test configs, that was an error)
Comment 7 Philipp Hahn univentionstaff 2019-12-16 09:15:00 CET
(In reply to Dirk Wiesenthal from comment #3)
> pgrep -f /usr/sbin/slapd -P 1
> 
> _seems_ to work.

FYI: Please do it correctly and use `pgrep -ns "$$" -f /usr/sbin/slapd`. Your solution might break as soon as we fix Bug #43691 and convert OpenLDAP to `systemd`, in which case `slapd` might not daemonize and as such does not get re-parented to init (PID 1). (actually systemd still double-forks when starting a service, so even works with `systemd` today as then PPID will be 1)

Your solution will gets into trouble when `slapd` is started manually or is running from a debugger in the foreground, as then PPID!=1 but TCP:389/636/7389/7636 will already be taken and the LMDBs will be opened for exclusive write, so a 2nd `slapd` should not/cannot be started. `/etc/init.d/slapd` MUST fail then!

Actually the test for '/usr/sbin/slapd' can also break during an update of the Debian slapd package: `slapd` is not stopped during the update phase, so its binary is locked in the file system by the Linux kernel. Because of this `dpkg` renames the binary and writes the new binary to the old location. Luckily `pgrep` still works as it checks the original command line from "/proc/$pid/cmdline" instead of following "/proc/$pid/exe", which changes due to the renaming; `start-stop-daemon --exec` uses that for example: ~/REPOS/DEBIAN/dpkg/utils/start-stop-daemon.c:1687 pid_is_exec()
Comment 8 Arvid Requate univentionstaff 2019-12-16 18:33:58 CET
> I also tried to update the whole script or at least transition to "pidfile" instead of "pgrep", but had problems with start-stop-daemon which is actually starting /usr/sbin/slapd.

Why not use "--pidfile /var/run/slapd/slapd.pid" similar to but even simpler than what the Debian stretch package does?

But I guess Philipps proposal may be even better.
Comment 9 Dirk Wiesenthal univentionstaff 2019-12-17 11:41:15 CET
I would like to keep it that way, very simple.

I now mention this bug in univention-ldap 15.0.0-34A~4.4.0.201912171121. New versions of the init script should be aware of this workaround.

A more modern approach with pidfiles should be incorporated as soon as we lift the whole script to "buster level" or so.

I also tested this bug fix with our own Docker Images. Interestingly, it works with "--ppid 1" inside the container...
Comment 10 Arvid Requate univentionstaff 2020-03-16 16:57:47 CET
Looks ok.
Comment 11 Erik Damrose univentionstaff 2020-03-18 12:27:44 CET
<http://errata.software-univention.de/ucs/4.4/497.html>