Univention Bugzilla – Bug 51367
socket leak - portal logon
Last modified: 2020-06-19 17:32:41 CEST
Logon at the portal seems to produce ldap connections that umc never closes: UMC Portal Logon/Logout -> lsof -p $(pidof -x univention-management-console-server)| grep 7389| wc -l 3 UMC Portal Logon/Logout -> lsof -p $(pidof -x univention-management-console-server)| grep 7389| wc -l 4 UMC Portal Logon/Logout UMC Portal Logon/Logout -> lsof -p $(pidof -x univention-management-console-server)| grep 7389| wc -l 6 univentio 11858 root 9u IPv4 48436 0t0 TCP master.four.four:52136->master.four.four:7389 (ESTABLISHED) univentio 11858 root 11u IPv4 48445 0t0 TCP master.four.four:52138->master.four.four:7389 (ESTABLISHED) univentio 11858 root 12u IPv4 48482 0t0 TCP master.four.four:52152->master.four.four:7389 (ESTABLISHED) univentio 11858 root 22u IPv4 49157 0t0 TCP master.four.four:52236->master.four.four:7389 (ESTABLISHED) univentio 11858 root 28u IPv4 50002 0t0 TCP master.four.four:52324->master.four.four:7389 (ESTABLISHED) univentio 11858 root 28u IPv4 50002 0t0 TCP master.four.four:52324->master.four.four:7389 (ESTABLISHED) after a couple of minutes univentio 11858 root 9u IPv4 48436 0t0 TCP master.four.four:52136->master.four.four:7389 (CLOSE_WAIT) univentio 11858 root 11u IPv4 48445 0t0 TCP master.four.four:52138->master.four.four:7389 (CLOSE_WAIT) univentio 11858 root 12u IPv4 48482 0t0 TCP master.four.four:52152->master.four.four:7389 (CLOSE_WAIT) univentio 11858 root 20u IPv4 50779 0t0 TCP master.four.four:52406->master.four.four:7389 (CLOSE_WAIT) univentio 11858 root 22u IPv4 49157 0t0 TCP master.four.four:52236->master.four.four:7389 (CLOSE_WAIT) univentio 11858 root 28u IPv4 50002 0t0 TCP master.four.four:52324->master.four.four:7389 (CLOSE_WAI and after a couple of hours univentio 1587 root 14u sock 0,8 0t0 5170056 protocol: TCP univentio 1587 root 15u sock 0,8 0t0 5174400 protocol: TCP univentio 1587 root 19u sock 0,8 0t0 5171087 protocol: TCP univentio 1587 root 20u sock 0,8 0t0 5172389 protocol: TCP univentio 1587 root 23u sock 0,8 0t0 5171616 protocol: TCP univentio 1587 root 26u sock 0,8 0t0 5175952 protocol: TCP and the ldap log during UMC Logon/Logout May 18 10:20:16 master slapd[1224]: conn=1066 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128 May 18 10:20:16 master slapd[1224]: conn=1066 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0 May 18 10:20:16 master slapd[1224]: conn=1067 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128 May 18 10:20:16 master slapd[1224]: conn=1067 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0 May 18 10:20:18 master slapd[1224]: conn=1068 op=1 BIND dn="uid=Administrator,cn=users,dc=four,dc=four" method=128 May 18 10:20:18 master slapd[1224]: conn=1068 op=1 BIND dn="uid=Administrator,cn=users,dc=four,dc=four" mech=SIMPLE ssf=0 May 18 10:20:23 master slapd[1224]: conn=1069 op=0 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128 May 18 10:20:23 master slapd[1224]: conn=1069 op=0 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0 May 18 10:20:23 master slapd[1224]: conn=1069 op=2 UNBIND May 18 10:20:24 master slapd[1224]: conn=1070 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128 May 18 10:20:24 master slapd[1224]: conn=1070 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0 May 18 10:20:24 master slapd[1224]: conn=1070 op=3 UNBIND May 18 10:20:24 master slapd[1224]: conn=1071 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" method=128 May 18 10:20:24 master slapd[1224]: conn=1071 op=1 BIND dn="cn=master,cn=dc,cn=computers,dc=four,dc=four" mech=SIMPLE ssf=0 May 18 10:20:24 master slapd[1224]: conn=1071 op=3 UNBIND May 18 10:20:24 master slapd[1224]: conn=1072 op=1 BIND dn="uid=Administrator,cn=users,dc=four,dc=four" method=128 May 18 10:20:24 master slapd[1224]: conn=1072 op=1 BIND dn="uid=Administrator,cn=users,dc=four,dc=four" mech=SIMPLE ssf=0 May 18 10:20:24 master slapd[1224]: conn=1073 op=1 BIND dn="cn=admin,dc=four,dc=four" method=128 May 18 10:20:24 master slapd[1224]: conn=1073 op=1 BIND dn="cn=admin,dc=four,dc=four" mech=SIMPLE ssf=0 here my guess is a missing unbind for the user connection.
Created attachment 10374 [details] repr2.py simple reproducer
Created attachment 10375 [details] portal.patch Patch from Florian, this patch helps, but we had strange Test problems. In two cases (S4 Slave, Non-S4 slave) the umc-web crashed and the selenium tests failed. We found no further information in the log files.
Hm, I think the crash is more due to the patch from Bug #51366 ?! Especially the selenium Tests (which probably use SAML) are affected by the removal of self.pam because it is used more than once.
The LDAP connections are now reset when the session is removed. And the LDAP connection which fetches the favorites for a user (on UMC login) is immediately closed. univention-management-console.yaml 28c855334c55 | YAML Bug #51367 univention-management-console (11.0.4-85) 23129ad89c1f | Bug #51367: don't reset glogal connection cache if no connections exists ddc7e1b6a0e5 | Bug #51367: add no_cache to hash of connection 1efc0549723d | Bug #51367: fix leak of ldap connection in session shutdown and favorites retrieval
I did the QA and as expected and mentioned, I can also confirm, that the patch works for password resets and logins with a wrong password. That I tried manually and additionally by executing repr1.sh and repr2.py. The number of open file descriptors does not increase any more and I was using ``` watch -n.2 'lsof -p $(pidof -x univention-management-console-server) | grep 7389 | wc -l' ``` to check that. However I made two further observations: 1) After a restart of the univention-management-console-server there are initially no file descriptors open and reloading the portal page has no effect on that. But if you try to log in, there is one file descriptor opened, which then never closes again. That would however not lead to ErrorCode 20 since it does not get more than one. Still I have the gut feeling, that there is also some cleanup code missing. 2) I can still spawn more open file descriptors by logging in with correct credentials and by reloading the portal page. Each refresh brings one new file descriptor. These are closed some time later (approx. after 30 seconds) due to a timeout, but it is still possible within that time frame to open more than 1000 file descriptors. I think that shows, that while the error cases now clean up as expected, the success case does not entirely: The portal page is shown in my browser long before the timeout occurs. That makes me think that after such a page is send, there needs to be the same clean up code as in the error handler. Just for reference. Here is the output from `/var/log/univention/management-console-server.log` when such a timeout occurs: ``` 11.06.20 15:03:02.610 LOCALE ( WARN ) : Could not find translation file: 'umc-core.mo' 11.06.20 15:03:02.732 MODULE ( PROCESS ) : Setting auth type to None 11.06.20 15:03:02.823 MAIN ( PROCESS ) : Updating user password in 0 running module processes (auth-type: None). 11.06.20 15:03:02.824 LOCALE ( WARN ) : Could not find translation file: 'umc-core.mo' 11.06.20 15:03:03.169 MAIN ( PROCESS ) : LDAP bind for user 'uid=Administrator,cn=users,dc=univention,dc=intranet'. 11.06.20 15:03:32.646 MAIN ( PROCESS ) : Connection timed out. 11.06.20 15:03:32.646 MAIN ( PROCESS ) : Processor: dying 11.06.20 15:03:32.646 MAIN ( PROCESS ) : Processor: dying 11.06.20 15:03:32.647 MAIN ( PROCESS ) : Processor: dying ``` I can only assume, that this has not been the root cause for the ErrorCode 20 the customer reported, but it still reduces the reliability in situations with high loads and should be fixed for that reason. But I am not sure, if that is still part of this ticket or if we should treat it as an improvement, because it only became visible after fixing this bug. What do you think? Is it a new bug or is it related?
(In reply to Max Pohle from comment #5) This is already tracked in Bug #51402.
<http://errata.software-univention.de/ucs/4.4/626.html>