Univention Bugzilla – Bug 32935
AppCenter starts N threads
Last modified: 2015-06-26 10:52:31 CEST
The app_center.py-module currently starts one thread for each file to be downloaded. This leads to (currently) 209 threads with 209 TCP-connections (trying to) starting 209 Apache processes on the server. (Each with a TCP and HTTPS handshake) Every added App increases this problem. As these files are static a single connection with properly pipelined requests is the fastest and most efficient (for both client and server). Emulating this process with wget allows fetching all 209 files in just 4.9 seconds using a single connection, whereas using 209 connections requires 6.5 seconds (on a quad-core machine) but using about 100 times more CPU time. (I think wget does not do pipelining so the runtime gains should be even greater) Not using threads would also allow the code to do error handling (currently missing files are only logged, not handled) This may impact the release negatively because of the release many new installations at the same time.
Created attachment 5903 [details] Limit download threads During the Release-tests for UCS-3.2-2 several (2+) VMs got stuck during the /usr/sbin/univention-register-apps; one has 120 open TCP+SSL connections to <https://download2.software-univention.de/>. The VM is stuck since >20h. <https://mail.python.org/pipermail/python-list/2007-April/429576.html>: > Pipelining is an experimental feature, designed to improve page-load performance, that is unfortunately not well supported by some web servers and proxies. [...] Instead of pipelining, it uses multiple connections (2 is the maximum RFC 2616 says SHOULD be used by a"single-user client").
(In reply to Philipp Hahn from comment #1) > <https://mail.python.org/pipermail/python-list/2007-April/429576.html>: > > Pipelining is an experimental feature, designed to improve page-load performance, that is unfortunately not well supported by some web servers and proxies. [...] Instead of pipelining, it uses multiple connections (2 is the maximum RFC 2616 says SHOULD be used by a"single-user client"). <https://en.wikipedia.org/w/index.php?oldid=602498966#Implementation_in_web_servers> „Implementing pipelining in web servers is a relatively simple matter […]. For that reason, most modern web servers handle pipelining without any problem.“ Additionally the appcenter server (our webserver) supports pipelining.
To achieve single connection, httplib with headers = {"Connection:" : "Keep-alive"} seems to be the simplest way: <https://docs.python.org/2/library/httplib.html#examples> Pipelining is not as easy, but also not required for acceptable performance.
(In reply to Janek Walkenhorst from comment #3) > To achieve single connection, httplib with > headers = {"Connection:" : "Keep-alive"} > seems to be the simplest way: > <https://docs.python.org/2/library/httplib.html#examples> > Pipelining is not as easy, but also not required for acceptable performance. … or see the example at http://stackoverflow.com/a/508840/3116268 : conn = httplib.HTTPConnection("www.somewhere.com") conn.request("GET", '/foo') conn.request("GET", '/bar') conn.request("GET", '/baz') Currently we are using urllib2.urlopen(request) for each file separatelx
Created attachment 5908 [details] Use httplib instead of urllib2 urllib2 seems to open a new TCP+HTTP connection for each request, while httplib does it only once (Keep-Alive). <http://docs.python-requests.org/en/latest/user/advanced/#request-and-response-objects> seams to support stream=True, which could perhaps work with pipelining, but its not mentioned explicitly and is not part of UCS-3.2. Python-PyCurl support pipelining: <http://curl.haxx.se/dev/readme-pipelining.html> and works with Python. Time: 33.297111 Using urllib2 Time: 10.072266 Using httplib Time: 10.301843 Using pycurl single Time: 8.733144 Using pycurl multi 2 threads Time: 4.861863 Using pycurl multi 5 threads
(In reply to Philipp Hahn from comment #5) > Created attachment 5908 [details] > Use httplib instead of urllib2 That code seems wrong for HTTPS (should use HTTPSConnection). Additionally the port argument is optional. Using httplib is sufficient: HTTP {'Connection:': 'keep-alive'} 4.20874 s HTTP {'Connection:': 'close'} 10.32169 s HTTPS {'Connection:': 'keep-alive'} 4.65623 s HTTPS {'Connection:': 'close'} 20.28731 s It is already being implemented by Dirk.
Fixed in univention-management-console-module-appcenter 3.0.51-10.244.201405130041 Uses HTTPSConnection. Should be faster while reducing network load. QA should have a look at a proxy environment. httplib does not use the globally installed proxy handler used elsewhere in the App Center. I am not sure whether my fix covers all "proxy situations". To test: rm /var/cache/univention-management-console/appcenter/*
I'd suggest to specify HTTPSConnection(..., timeout=60) # seconds to prevent infinite downloads.
(In reply to Philipp Hahn from comment #8) > I'd suggest to specify HTTPSConnection(..., timeout=60) # seconds > to prevent infinite downloads. Two of my 3.2-2 test installations are stuck at 95% in "35univention-management-console-module-appcenter.inst" with one open connection to download2.software-univention.de:https; last message in /var/log/univention/join.log is > Downloading "https://appcnter.software.univention.de/meta-inf/3.2/index.json.gz" "netstat -t" shows the connection with Recv-Q=Send-Q=0. "ifconfig eth0" shows 173/485 dropped RX packets in the VM. "ifconfig vnet8" shows 158 TX packets overruns on the host "ping $GW" still works. "gdb bt": __read_nocancel @ ../sysdeps/unix/syscall-template.S.82 ?? BIO_read ssl3_rean_n ssl3_read_bytes ssl3_read @ libssl.so.0.9.8 PySSL_SSLread @ ../Modules/_ssl.c:1300 /usr/lib/pymodules/python2.6/ssl.py:136 /usr/lib/pymodules/python2.6/ssl.py:215 /usr/lib/pymodules/python2.6/socket.py:2427 /usr/lib/pymodules/python2.6/httplib.py:349 /usr/lib/pymodules/python2.6/httplib.py:391 /usr/lib/pymodules/python2.6/httplib.py:990 /usr/lib/pymodules/python2.6/httplib.py:990 /usr/lib/pymodules/univention/management/console/modules/appcenter/app_center.py:612 /usr/lib/pymodules/univention/management/console/modules/appcenter/app_center.py:674 /usr/lib/pymodules/univention/management/console/modules/appcenter/app_center.py:653 /usr/sbin/univention-register-apps:53 /usr/sbin/univention-register-apps:64
(In reply to Philipp Hahn from comment #8) > I'd suggest to specify HTTPSConnection(..., timeout=60) # seconds > to prevent infinite downloads. Done in univention-management-console-module-appcenter 3.0.51-12.247.201405131404 HTTPSConnection(..., timeout=60) and urllib2.urlopen(..., timeout=60)
Code. OK Changelog: OK Installation tests: OK App Center tests: OK Proxy tests: Failed: When I use username and password for the squid authentification (proxy/http: http://sg:univention@10.201.19.1:3128), I get the following traceback: root@backup192:~# univention-add-app -l owncloud5 14.05.14 12:26:02.450 MODULE ( PROCESS ) : Downloading "https://appcenter.software-univention.de/meta-inf/3.2/index.json.gz"... Traceback (most recent call last): File "/usr/sbin/univention-add-app", line 83, in <module> requested_app = Application.find(component_id) File "/usr/lib/pymodules/python2.6/univention/management/console/modules/appcenter/app_center.py", line 519, in find for application in cls.all(): File "/usr/lib/pymodules/python2.6/univention/management/console/modules/appcenter/app_center.py", line 677, in all cls.sync_with_server() File "/usr/lib/pymodules/python2.6/univention/management/console/modules/appcenter/app_center.py", line 601, in sync_with_server connection.connect() File "/usr/lib/python2.6/httplib.py", line 1112, in connect sock = socket.create_connection((self.host, self.port), self.timeout) File "/usr/lib/python2.6/socket.py", line 547, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): socket.gaierror: [Errno -2] Name or service not known root@backup192:~#
I've reset the changes and will add a simple fix for Comment #1.
Fixed in univention-management-console-module-appcenter 3.0.51-16.250.201406031613 No threads used anymore, all files are downloaded one after another (each with new connection and rock solid urlopen()). If this would mean more than 5 files to download, the App Center instead downloads the compiled archive all.tar.gz (to be found in the meta-inf directory) and extracts the files instead.
FYI, a related problem has been reported in the forum: http://forum.univention.de/viewtopic.php?f=48&t=3366
Behaviour seems fine. It would be nice to have a log line that says "Downloading all.tar.gz": > 26.06.14 14:06:23.645 MODULE ( PROCESS ) : Downloading "https://master50.errata.qa/meta-inf/3.2/index.json.gz"... > 26.06.14 14:06:23.659 MODULE ( PROCESS ) : 6 file(s) need to be downloaded > 26.06.14 14:06:23.673 MODULE ( INFO ) : Extracting univention-demoapp1.README_POST_INSTALL > 26.06.14 14:06:23.676 MODULE ( INFO ) : Extracting univention-demoapp1.README_POST_UNINSTALL > 26.06.14 14:06:23.676 MODULE ( INFO ) : Extracting univention-demoapp1.README_UNINSTALL > 26.06.14 14:06:23.676 MODULE ( INFO ) : Extracting univention-demoapp1.README_UPDATE > 26.06.14 14:06:23.677 MODULE ( INFO ) : Extracting univention-demoapp1.README_POST_UPDATE > 26.06.14 14:06:23.677 MODULE ( INFO ) : Extracting univention-demoapp1.README_INSTALL
We need a graceful degradation in case the all.tar.gz file is not present (e.g., when the repository server has been mirrored).
repo-ng is yet to be fixed (3.0 patch -> 3.2 patch) But you should be able to test the appcenter module again.
(In reply to Alexander Kläser from comment #16) > We need a graceful degradation in case the all.tar.gz file is not present > (e.g., when the repository server has been mirrored). Fixed in univention-management-console-module-appcenter 3.0.51-22.256.201407011022 Return of the Threads... I limited them to at most 10 threads at one downloading the files. Note that with the new all.tar.gz it should not be necessary even to use these 10 threads. Either everything is fetched from the archive, or (if 5 or less files are new) those are downloaded sequentially in one single thread.
Created attachment 5982 [details] appcenter log file (In reply to Dirk Wiesenthal from comment #18) > Fixed in > univention-management-console-module-appcenter 3.0.51-22.256.201407011022 > > Return of the Threads... I limited them to at most 10 threads at one > downloading the files. > > Note that with the new all.tar.gz it should not be necessary even to use > these 10 threads. > > Either everything is fetched from the archive, or (if 5 or less files are > new) those are downloaded sequentially in one single thread. This seems to work fine in general. Only the test with downloading 216 new files seems to be incorrect (see attached log file): > Starting to download 22 file(s) directly > ... But only 10 files are downloaded. This leads to the last thread downloading many more files.
Everything is like it should be: 216 in total 1st: 22 files 2nd: 22 files ... 10th: 14 files As these are threads, the log writes Downloading ...vim60.ini to .../appcenter/vim60.ini whenever it starts the download in one of the 10 threads. Some files in the "last list" are actually from the first or second thread, I suppose. The download of those threads is not finished when the last thread is started. Reopen this bug if you have evidence that this is not the case, i.e. the last thread downloads them all or if you want to have something changed in the log messages.
(In reply to Dirk Wiesenthal from comment #20) > Everything is like it should be: > ... True, then the log output confused me a bit, you are right. (In reply to Dirk Wiesenthal from comment #13) > Fixed in > univention-management-console-module-appcenter 3.0.51-16.250.201406031613 > > No threads used anymore, all files are downloaded one after another (each > with new connection and rock solid urlopen()). If this would mean more than > 5 files to download, the App Center instead downloads the compiled archive > all.tar.gz (to be found in the meta-inf directory) and extracts the files > instead. I checked the behaviour with a simulated slow connection (50kbit/sec), downloading all.tar.gz took maybe 1sec, extraction took about 1sec, and downloading 25 separate png/README files took about 5sec. So the gain in speed is very good. (In reply to Dirk Wiesenthal from comment #18) > (In reply to Alexander Kläser from comment #16) > > We need a graceful degradation in case the all.tar.gz file is not present > > (e.g., when the repository server has been mirrored). > > Fixed in > univention-management-console-module-appcenter 3.0.51-22.256.201407011022 > > Return of the Threads... I limited them to at most 10 threads at one > downloading the files. > > Note that with the new all.tar.gz it should not be necessary even to use > these 10 threads. > > Either everything is fetched from the archive, or (if 5 or less files are > new) those are downloaded sequentially in one single thread. This is working fine, as well. Test with proxy + authentication worked fine. YAML file → OK → VERIFIED
http://errata.univention.de/ucs/3.2/133.html