Univention Bugzilla – Bug 52316
[Errno 11] Resource temporarily unavailable / Die Ressource ist zur Zeit nicht verfügbar
Last modified: 2021-06-15 08:29:13 CEST
After adding the new multi process feature to the UMC server, the log file is filled with "Errno 11" messages and less than 1% of all UMC requests fails with error 503: 03.11.20 10:44:02.685 MODULE ( PROCESS ) : 12860: Setting auth type to u'SAML' 03.11.20 10:44:02.694 MODULE ( PROCESS ) : 12862: Setting auth type to u'SAML' 03.11.20 10:44:02.818 MAIN ( ERROR ) : 12860: Cannot accept new connection: [Errno 11] Die Ressource ist zur Zeit nicht verfügbar 03.11.20 10:44:02.818 MAIN ( ERROR ) : 12861: Cannot accept new connection: [Errno 11] Die Ressource ist zur Zeit nicht verfügbar 03.11.20 10:44:02.878 MAIN ( ERROR ) : 12862: Cannot accept new connection: [Errno 11] Die Ressource ist zur Zeit nicht verfügbar 03.11.20 10:44:02.890 MAIN ( PROCESS ) : 12861: LDAP bind for user 'uid=Administrator,cn=users,dc=performance445,dc=ucs'. 03.11.20 10:44:02.896 MAIN ( WARN ) : 12861: SAML binddn does not match: 'uid=administrator,cn=users,dc=performance445,dc=ucs' != 'uid=Administrator,cn=users,dc=performance445,dc=ucs' 03.11.20 10:44:02.905 MAIN ( ERROR ) : 12862: Cannot accept new connection: [Errno 11] Die Ressource ist zur Zeit nicht verfügbar
384 def _connection(self, socket): 385 '''Signal callback: Invoked on incoming connections.''' 386 try: 387 socket, addr = socket.accept() 388 except EnvironmentError as exc: 389 CORE.error('Cannot accept new connection: %s' % (exc,)) 390 soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE) 391 resource.setrlimit(resource.RLIMIT_NOFILE, (soft + 2, hard + 2)) 392 try: 393 socket, addr = socket.accept() 394 socket.close() 395 except EnvironmentError: 396 pass 397 finally: 398 resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard)) 399 return True 400 fd = socket.fileno() 401 flags = fcntl.fcntl(fd, fcntl.F_GETFD) 402 fcntl.fcntl(fd, fcntl.F_SETFD, flags | fcntl.FD_CLOEXEC) The code shown here should actually deal with the case that not enough free file descriptors are available (→ errno 24: Too many open files). Since the listening socket has a backlog (for 100 connections → SERVER_MAX_CONNECTIONS), incoming TCP connections are accepted and established by the kernel, but not yet passed from the kernel to the UMC server process. If there are no more free file descriptors available, the socket.accept() call always throws an errno 24 error and the already established TCP connections starve to death in the backlog, unless the UMC server gets free file descriptors again. To prevent starvation, the code temporarily raises the limit for open filehandles by 2, executes the accept() and immediately closes the connection again. This causes the UMC web server not to get a valid connection to the UMC server and returns an error 503 to the browser, so that the browser is informed and the request does not wait infinitely for an accept(). In the new multiprocess UMC server, all UMC servers have their listening socket bound to the same port. As long as there is nothing to do, all UMC server processes wait in epoll(). When a new connection comes in, all processes are woken up and they call the _connection() method, where a socket.accept() is called. In the following we assume that UMCServer A has successfully accepted the new socket. All other processes (B,C,D,...) get an EnvironmentError exception with errno 11, because there is nothing to do at the moment. (Errno 11 stands for "Resource temporarily unavailable"/"The resource is currently not available" or EAGAIN). The above code does not distinguish between errno 11 and errno 24 and from time to time a special corner case occurs e.g. with process B: between line 387 (first accept()) and line 393 (second accept()) a new TCP connection from the UMC web server comes in. The code for "no free filehandles" accepts the new TCP connection in line 393 and closes it again immediately in line 394. As described above, the UMC web server then returns an error message 503, but this time there was no error at all and the connection was immediately closed again in an unjustified way. The fix for this problem now catches errno 11 before the log output in line 389 and immediately makes a "return True" to avoid unnecessary log output. Furthermore, lines 390 to 398 are now only executed on errno 24. Package: univention-management-console Version: 11.0.5-10-dp3A~4.4.0.202011051840 Branch: ucs_4.4-0 Scope: portal-performance
It's was introduced in Bug #39909 git:fff8d0fd01cf99b00ff3d8257408f21374b7bef8
LGTM, please merge to 4.4-6-errata.
I cherrypicked the commits to branch "4.4-6" and rebuilt univention-management-console: [4.4-6] faf9d33a96 Bug #52316: update advisory [4.4-6] d2ffa8ca43 Bug #52316: update advisory [4.4-6] 1722b69268 Bug #52316: add changelog entry [4.4-6] ce22099bb8 Bug #52316: fix faulty connections and prevent lots of log messages Package: univention-management-console Version: 11.0.5-16A~4.4.0.202011171723 Branch: ucs_4.4-0 Scope: errata4.4-6
OK: catching of errno 11 OK: traceback logging OK: YAML (fixed in 65a25d0a4a99)
<https://errata.software-univention.de/#/?erratum=4.4x822>