Bug 51367 - socket leak - portal logon
socket leak - portal logon
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 4.4-4-errata
Assigned To: Florian Best
Max Pohle
:
Depends on:
Blocks: 51047
  Show dependency treegraph
 
Reported: 2020-05-28 11:51 CEST by Felix Botner
Modified: 2020-06-19 17:32 CEST (History)
2 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.114
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
repr2.py (1.75 KB, text/x-python)
2020-05-28 11:51 CEST, Felix Botner
Details
portal.patch (5.09 KB, patch)
2020-05-28 11:58 CEST, Felix Botner
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Botner univentionstaff 2020-05-28 11:51:04 CEST
Logon at the portal seems to produce ldap connections that umc never closes:


UMC Portal Logon/Logout
-> lsof -p $(pidof -x univention-management-console-server)| grep 7389| wc -l
3
UMC Portal Logon/Logout
-> lsof -p $(pidof -x univention-management-console-server)| grep 7389| wc -l
4
UMC Portal Logon/Logout
UMC Portal Logon/Logout
-> lsof -p $(pidof -x univention-management-console-server)| grep 7389| wc -l
6


univentio 11858 root    9u  IPv4              48436       0t0     TCP master.four.four:52136->master.four.four:7389 (ESTABLISHED)
univentio 11858 root   11u  IPv4              48445       0t0     TCP master.four.four:52138->master.four.four:7389 (ESTABLISHED)
univentio 11858 root   12u  IPv4              48482       0t0     TCP master.four.four:52152->master.four.four:7389 (ESTABLISHED)
univentio 11858 root   22u  IPv4              49157       0t0     TCP master.four.four:52236->master.four.four:7389 (ESTABLISHED)
univentio 11858 root   28u  IPv4              50002       0t0     TCP master.four.four:52324->master.four.four:7389 (ESTABLISHED)
univentio 11858 root   28u  IPv4              50002       0t0     TCP master.four.four:52324->master.four.four:7389 (ESTABLISHED)

after a couple of minutes

univentio 11858 root    9u  IPv4              48436       0t0     TCP master.four.four:52136->master.four.four:7389 (CLOSE_WAIT)
univentio 11858 root   11u  IPv4              48445       0t0     TCP master.four.four:52138->master.four.four:7389 (CLOSE_WAIT)
univentio 11858 root   12u  IPv4              48482       0t0     TCP master.four.four:52152->master.four.four:7389 (CLOSE_WAIT)
univentio 11858 root   20u  IPv4              50779       0t0     TCP master.four.four:52406->master.four.four:7389 (CLOSE_WAIT)
univentio 11858 root   22u  IPv4              49157       0t0     TCP master.four.four:52236->master.four.four:7389 (CLOSE_WAIT)
univentio 11858 root   28u  IPv4              50002       0t0     TCP master.four.four:52324->master.four.four:7389 (CLOSE_WAI

and after a couple of hours

univentio 1587 root   14u  sock                0,8       0t0 5170056 protocol: TCP
univentio 1587 root   15u  sock                0,8       0t0 5174400 protocol: TCP
univentio 1587 root   19u  sock                0,8       0t0 5171087 protocol: TCP
univentio 1587 root   20u  sock                0,8       0t0 5172389 protocol: TCP
univentio 1587 root   23u  sock                0,8       0t0 5171616 protocol: TCP
univentio 1587 root   26u  sock                0,8       0t0 5175952 protocol: TCP


and the ldap log during UMC Logon/Logout

May 18 10:20:16 master slapd[1224]: conn=1066 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128
May 18 10:20:16 master slapd[1224]: conn=1066 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0
May 18 10:20:16 master slapd[1224]: conn=1067 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128
May 18 10:20:16 master slapd[1224]: conn=1067 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0
May 18 10:20:18 master slapd[1224]: conn=1068 op=1 BIND dn="uid=Administrator,cn=users,dc=four,dc=four" method=128
May 18 10:20:18 master slapd[1224]: conn=1068 op=1 BIND dn="uid=Administrator,cn=users,dc=four,dc=four" mech=SIMPLE ssf=0
May 18 10:20:23 master slapd[1224]: conn=1069 op=0 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128
May 18 10:20:23 master slapd[1224]: conn=1069 op=0 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0
May 18 10:20:23 master slapd[1224]: conn=1069 op=2 UNBIND
May 18 10:20:24 master slapd[1224]: conn=1070 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128
May 18 10:20:24 master slapd[1224]: conn=1070 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0
May 18 10:20:24 master slapd[1224]: conn=1070 op=3 UNBIND
May 18 10:20:24 master slapd[1224]: conn=1071 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128
May 18 10:20:24 master slapd[1224]: conn=1071 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0
May 18 10:20:24 master slapd[1224]: conn=1071 op=3 UNBIND
May 18 10:20:24 master slapd[1224]: conn=1072 op=1 BIND dn="uid=Administrator,cn=users,dc=four,dc=four" method=128
May 18 10:20:24 master slapd[1224]: conn=1072 op=1 BIND dn="uid=Administrator,cn=users,dc=four,dc=four" mech=SIMPLE ssf=0
May 18 10:20:24 master slapd[1224]: conn=1073 op=1 BIND dn="cn=admin,dc=four,dc=four" method=128
May 18 10:20:24 master slapd[1224]: conn=1073 op=1 BIND dn="cn=admin,dc=four,dc=four" mech=SIMPLE ssf=0

here my guess is a missing unbind for the user connection.
Comment 1 Felix Botner univentionstaff 2020-05-28 11:51:38 CEST
Created attachment 10374 [details]
repr2.py

simple reproducer
Comment 2 Felix Botner univentionstaff 2020-05-28 11:58:34 CEST
Created attachment 10375 [details]
portal.patch

Patch from Florian, this patch helps, but we had strange Test problems. In two cases (S4 Slave, Non-S4 slave) the umc-web crashed and the selenium tests failed. We found no further information in the log files.
Comment 3 Florian Best univentionstaff 2020-05-28 12:36:51 CEST
Hm, I think the crash is more due to the patch from Bug #51366 ?!
Especially the selenium Tests (which probably use SAML) are affected by the removal of self.pam because it is used more than once.
Comment 4 Florian Best univentionstaff 2020-06-09 15:42:02 CEST
The LDAP connections are now reset when the session is removed. And the LDAP connection which fetches the favorites for a user (on UMC login) is immediately closed.

univention-management-console.yaml
28c855334c55 | YAML Bug #51367

univention-management-console (11.0.4-85)
23129ad89c1f | Bug #51367: don't reset glogal connection cache if no connections exists
ddc7e1b6a0e5 | Bug #51367: add no_cache to hash of connection
1efc0549723d | Bug #51367: fix leak of ldap connection in session shutdown and favorites retrieval
Comment 5 Max Pohle univentionstaff 2020-06-11 16:03:22 CEST
I did the QA and as expected and mentioned, I can also confirm, that the patch works for password resets and logins with a wrong password. That I tried manually and additionally by executing repr1.sh and repr2.py. The number of open file descriptors does not increase any more and I was using
```
watch -n.2 'lsof -p $(pidof -x univention-management-console-server) | grep 7389 | wc -l'
```
to check that.

However I made two further observations:

1) After a restart of the univention-management-console-server there are initially no file descriptors open and reloading the portal page has no effect on that. But if you try to log in, there is one file descriptor opened, which then never closes again. That would however not lead to ErrorCode 20 since it does not get more than one. Still I have the gut feeling, that there is also some cleanup code missing.

2) I can still spawn more open file descriptors by logging in with correct credentials and by reloading the portal page. Each refresh brings one new file descriptor. These are closed some time later (approx. after 30 seconds) due to a timeout, but it is still possible within that time frame to open more than 1000 file descriptors. I think that shows, that while the error cases now clean up as expected, the success case does not entirely: The portal page is shown in my browser long before the timeout occurs. That makes me think that after such a page is send, there needs to be the same clean up code as in the error handler. Just for reference. Here is the output from `/var/log/univention/management-console-server.log` when such a timeout occurs:


```
11.06.20 15:03:02.610  LOCALE      ( WARN    ) : Could not find translation file: 'umc-core.mo'
11.06.20 15:03:02.732  MODULE      ( PROCESS ) : Setting auth type to None
11.06.20 15:03:02.823  MAIN        ( PROCESS ) : Updating user password in 0 running module processes (auth-type: None).
11.06.20 15:03:02.824  LOCALE      ( WARN    ) : Could not find translation file: 'umc-core.mo'
11.06.20 15:03:03.169  MAIN        ( PROCESS ) : LDAP bind for user 'uid=Administrator,cn=users,dc=univention,dc=intranet'.
11.06.20 15:03:32.646  MAIN        ( PROCESS ) : Connection timed out.
11.06.20 15:03:32.646  MAIN        ( PROCESS ) : Processor: dying
11.06.20 15:03:32.646  MAIN        ( PROCESS ) : Processor: dying
11.06.20 15:03:32.647  MAIN        ( PROCESS ) : Processor: dying
```

I can only assume, that this has not been the root cause for the ErrorCode 20 the customer reported, but it still reduces the reliability in situations with high loads and should be fixed for that reason. But I am not sure, if that is still part of this ticket or if we should treat it as an improvement, because it only became visible after fixing this bug. What do you think? Is it a new bug or is it related?
Comment 6 Florian Best univentionstaff 2020-06-11 16:06:52 CEST
(In reply to Max Pohle from comment #5)
This is already tracked in Bug #51402.
Comment 7 Erik Damrose univentionstaff 2020-06-17 15:39:29 CEST
<http://errata.software-univention.de/ucs/4.4/626.html>