Bug 52316 - [Errno 11] Resource temporarily unavailable / Die Ressource ist zur Zeit nicht verfügbar
[Errno 11] Resource temporarily unavailable / Die Ressource ist zur Zeit nich...
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 4.4
Other Linux
: P5 normal (vote)
: UCS 4.4-6-errata
Assigned To: Sönke Schwardt-Krummrich
Florian Best
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-11-05 18:11 CET by Sönke Schwardt-Krummrich
Modified: 2021-06-15 08:29 CEST (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 1: Will affect a very few installed domains
How will those affected feel about the bug?: 4: A User would return the product
User Pain: 0.114
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional): Regression
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sönke Schwardt-Krummrich univentionstaff 2020-11-05 18:11:38 CET
After adding the new multi process feature to the UMC server, the log file is filled with "Errno 11" messages and less than 1% of all UMC requests fails with error 503:

03.11.20 10:44:02.685  MODULE      ( PROCESS ) : 12860: Setting auth type to u'SAML'
03.11.20 10:44:02.694  MODULE      ( PROCESS ) : 12862: Setting auth type to u'SAML'
03.11.20 10:44:02.818  MAIN        ( ERROR   ) : 12860: Cannot accept new connection: [Errno 11] Die Ressource ist zur Zeit nicht verfügbar
03.11.20 10:44:02.818  MAIN        ( ERROR   ) : 12861: Cannot accept new connection: [Errno 11] Die Ressource ist zur Zeit nicht verfügbar
03.11.20 10:44:02.878  MAIN        ( ERROR   ) : 12862: Cannot accept new connection: [Errno 11] Die Ressource ist zur Zeit nicht verfügbar
03.11.20 10:44:02.890  MAIN        ( PROCESS ) : 12861: LDAP bind for user 'uid=Administrator,cn=users,dc=performance445,dc=ucs'.
03.11.20 10:44:02.896  MAIN        ( WARN    ) : 12861: SAML binddn does not match: 'uid=administrator,cn=users,dc=performance445,dc=ucs' != 'uid=Administrator,cn=users,dc=performance445,dc=ucs'
03.11.20 10:44:02.905  MAIN        ( ERROR   ) : 12862: Cannot accept new connection: [Errno 11] Die Ressource ist zur Zeit nicht verfügbar
Comment 1 Sönke Schwardt-Krummrich univentionstaff 2020-11-05 18:40:14 CET
384 def _connection(self, socket):
385     '''Signal callback: Invoked on incoming connections.'''
386     try:
387         socket, addr = socket.accept()
388     except EnvironmentError as exc:
389         CORE.error('Cannot accept new connection: %s' % (exc,))
390         soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
391         resource.setrlimit(resource.RLIMIT_NOFILE, (soft + 2, hard + 2))
392         try:
393             socket, addr = socket.accept()
394             socket.close()
395         except EnvironmentError:
396             pass
397         finally:
398             resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard))
399         return True
400     fd = socket.fileno()
401     flags = fcntl.fcntl(fd, fcntl.F_GETFD)
402     fcntl.fcntl(fd, fcntl.F_SETFD, flags | fcntl.FD_CLOEXEC)

The code shown here should actually deal with the case that not enough free 
file descriptors are available (→ errno 24: Too many open files).
Since the listening socket has a backlog (for 100 connections → 
SERVER_MAX_CONNECTIONS), incoming TCP connections are accepted and established 
by the kernel, but not yet passed from the kernel to the UMC server process.

If there are no more free file descriptors available, the socket.accept() call 
always throws an errno 24 error and the already established TCP connections 
starve to death in the backlog, unless the UMC server gets free file descriptors 
again. To prevent starvation, the code temporarily raises the limit for open 
filehandles by 2, executes the accept() and immediately closes the connection 
again. This causes the UMC web server not to get a valid connection to the UMC 
server and returns an error 503 to the browser, so that the browser is informed 
and the request does not wait infinitely for an accept().


In the new multiprocess UMC server, all UMC servers have their listening socket 
bound to the same port. As long as there is nothing to do, all UMC server 
processes wait in epoll().
When a new connection comes in, all processes are woken up and they call the 
_connection() method, where a socket.accept() is called.
In the following we assume that UMCServer A has successfully accepted the new 
socket. All other processes (B,C,D,...) get an EnvironmentError exception with 
errno 11, because there is nothing to do at the moment.
(Errno 11 stands for "Resource temporarily unavailable"/"The resource is 
currently not available" or EAGAIN).

The above code does not distinguish between errno 11 and errno 24 and from time 
to time a special corner case occurs e.g. with process B:
between line 387 (first accept()) and line 393 (second accept()) a new TCP 
connection from the UMC web server comes in. The code for "no free filehandles" 
accepts the new TCP connection in line 393 and closes it again immediately in 
line 394. As described above, the UMC web server then returns an error message 
503, but this time there was no error at all and the connection was immediately 
closed again in an unjustified way.

The fix for this problem now catches errno 11 before the log output in line 389 
and immediately makes a "return True" to avoid unnecessary log output.
Furthermore, lines 390 to 398 are now only executed on errno 24.


Package: univention-management-console
Version: 11.0.5-10-dp3A~4.4.0.202011051840
Branch: ucs_4.4-0
Scope: portal-performance
Comment 2 Florian Best univentionstaff 2020-11-05 19:07:05 CET
It's was introduced in Bug #39909 git:fff8d0fd01cf99b00ff3d8257408f21374b7bef8
Comment 3 Florian Best univentionstaff 2020-11-09 13:15:38 CET
LGTM, please merge to 4.4-6-errata.
Comment 4 Sönke Schwardt-Krummrich univentionstaff 2020-11-17 17:27:07 CET
I cherrypicked the commits to branch "4.4-6" and rebuilt 
univention-management-console:

[4.4-6] faf9d33a96 Bug #52316: update advisory
[4.4-6] d2ffa8ca43 Bug #52316: update advisory
[4.4-6] 1722b69268 Bug #52316: add changelog entry
[4.4-6] ce22099bb8 Bug #52316: fix faulty connections and prevent lots of log messages

Package: univention-management-console
Version: 11.0.5-16A~4.4.0.202011171723
Branch: ucs_4.4-0
Scope: errata4.4-6
Comment 5 Florian Best univentionstaff 2020-11-18 12:13:02 CET
OK: catching of errno 11
OK: traceback logging
OK: YAML (fixed in 65a25d0a4a99)