Bug 41361 - High system load due to proxy reloads
High system load due to proxy reloads
Status: CLOSED FIXED
Product: UCS@school
Classification: Unclassified
Component: Proxy services
UCS@school 4.1
Other Linux
: P5 critical (vote)
: UCS@school 4.1 R2 vXXX
Assigned To: Janek Walkenhorst
Daniel Tröder
: interim-1
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-05-26 18:12 CEST by Sönke Schwardt-Krummrich
Modified: 2016-10-13 16:53 CEST (History)
3 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 5: Will affect all installed domains
How will those affected feel about the bug?: 4: A User would return the product
User Pain: 0.571
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sönke Schwardt-Krummrich univentionstaff 2016-05-26 18:12:04 CEST
Currently squid/ntlmauth/children is set to 50 by default. A UCS@school listener module changes UCR proxy variables on each group change. Each change of the UCR variable also calls "/etc/init.d/squid* reload", so all of squid's child helper processes get restarted (50 basic auth, 50 ntlm, 20 rewrite).

On my test system, ucs-test raised the load to 30 this way. I think, we should find a better solution. E.g. only one squid reload every 10 seconds.
Comment 1 Felix Botner univentionstaff 2016-06-14 13:22:07 CEST
specially annoying in the exam module
-> uptime
 13:11:18 up  3:16,  2 users,  load average: 217,46, 201,69, 113,51
Comment 2 Sönke Schwardt-Krummrich univentionstaff 2016-06-14 14:06:32 CEST
We should implement a daemon that listens blocking on a named pipe. If something is written to the daemon, it wakes up and reloads squid if the last reload was longer that 15 seconds ago. Just prior to reloading squid, the queue of the named pipe has to be flushed.
Comment 3 Felix Botner univentionstaff 2016-06-15 16:36:19 CEST
I would suggest to recheck the squidguard ldap lookup feature.

One reason we dropped this might be the fact that former ntlm auth helper returned the username in the form of DOMAIN\username and we don't have this as memberuid in our ldap gropus.

-> more /var/lib/ucs-school-webproxy/usergroup-school1-2d1a 
s1
W2K12\s1
s2
W2K12\s2
..

But with 4.1-2, all helpers seem to return a proper username

-> kinit s1
-> curl --proxy-user : --proxy "http://10.200.7.151:3128" \
   --proxy-negotiate  www.google.de
-> curl --proxy "http://10.200.7.151:3128" --proxy-ntlm \
   --proxy-user "w2k12ssss\s1:univention" www.google.de
-> curl --proxy "http://10.200.7.151:3128" --proxy-ntlm \
   --proxy-user "s1:univention" www.google.de
-> curl --proxy "http://10.200.7.151:3128" --proxy-basic \
   --proxy-user "s1:univention" www.google.de


/var/log/squid3/access.log
...23 10.200.7.151 TCP_HIT/200 814 GET http://www.google.de/ s1 NONE/- text/html
...1 10.200.7.151 TCP_HIT/200 595 GET http://www.google.de/ s1 NONE/- text/html
...1 10.200.7.151 TCP_HIT/200 595 GET http://www.google.de/ s1 NONE/- text/html
...0 10.200.7.151 TCP_HIT/200 595 GET http://www.google.de/ s1 NONE/- text/html

In my small test, squidgard with ldap backend worked:


-> more  /etc/squidguard/squidGuard.conf

ldapbinddn cn=slave,cn=dc,cn=server,cn=computers,ou=school1,dc=w2k12,dc=test
ldapbindpass uGkDOSCinL7tJTwmBe2f

src usergroup-school1-2d1a {
     ldapusersearch  ldap://localhost:7389/dc=w2k12,dc=test?memberUid?sub?(&(objectclass=univentionGroup)(cn=school1-1a)(memberUid=%s))
}

...

acl {
         usergroup-school1-2d1a {
                 pass whitelist-Kein-20Internet none
                 redirect http://slave.w2k12.test/blocked-by-squid.html
         }

         default {
                  pass whitelist !blacklist !global-blacklist all
                  redirect http://slave.w2k12.test/blocked-by-squid.html
         }
}

(user s1 is member of school1-1a)

-> curl --proxy "http://10.200.7.151:3128" --proxy-ntlm \
   --proxy-user "s1:univention" www.google.de
BLOCKED

-> curl --proxy "http://10.200.7.151:3128" --proxy-ntlm --proxy-user "Administrator:univention" www.google.de
OK

With this setup we could
 * remove the group-to-ucr sync on school slaves
 * drastically decrease the need for reloading the squid daemon

to check
 * squidguard ldap performance
 * squidguard "ldapcachetime" could be an issue
 * are these "proxy/filter/usergroup" used anywhere else
Comment 4 Sönke Schwardt-Krummrich univentionstaff 2016-08-01 12:09:24 CEST
Workaround is currently to reduce the number of helper processes, e.g.:
squid/basicauth/children=5
squid/ntlmauth/children=5
squid/krb5auth/children=5
squid/rewrite/children=5

Please note that 3 may be too low for larger schools.
Comment 5 Janek Walkenhorst univentionstaff 2016-09-20 18:36:59 CEST
Added a service to ucs-school-webproxy that can be triggered via a unix socket to reload select services.
It reloads the service immediately if the last reload was sufficiently long ago, otherwise it reloads only at most every 15 s.

Modified ucs-school-webproxy.py to try to trigger the reload via the reload-service, otherwise fall back to the old behavior of immediately reloading it by itself.

r72703
r72704

ucs-school-webproxy 12.0.3-3.119.201609201834

Advisory: ucs-school-webproxy.yaml
Comment 7 Florian Best univentionstaff 2016-09-23 17:21:56 CEST
I see this:
2288 runsvdir -P /etc/service log: .........................................................................................runsv univention-reload-service: fatal: unable to start ./run: file does not exist runsv univention-reload-service: fatal: unable to start ./run: file does not exist runsv univention-reload-service: fatal: unable to start ./run: file does not exist .........................................................

The symlink to the unix socket is also dead:
root@xen8:~# ls -l /proc/6992/fd
insgesamt 0
lrwx------ 1 root root 64 Sep 22 21:39 0 -> /dev/null
l-wx------ 1 root root 64 Sep 22 21:39 1 -> /var/log/univention/reload-service.log
l-wx------ 1 root root 64 Sep 22 21:39 2 -> /var/log/univention/reload-service.log
lrwx------ 1 root root 64 Sep 22 21:39 3 -> socket:[271762]

but accessing the service works :)

It might also be better to pass close_fds=True to the subprocess.call? This is maybe not relevant for "squid3 reload" but might be for other services if they get added in the future and just copy&paste the code.
Comment 8 Daniel Tröder univentionstaff 2016-09-26 08:27:41 CEST
(In reply to Florian Best from comment #7)
> I see this:
> 2288 runsvdir -P /etc/service log:
> .............................................................................
> ............runsv univention-reload-service: fatal: unable to start ./run:
> file does not exist runsv univention-reload-service: fatal: unable to start
> ./run: file does not exist runsv univention-reload-service: fatal: unable to
> start ./run: file does not exist
> .........................................................
This seems to be from installation time. The service works, and after a restart the message doesn't reappear. I still wonder why that happens...
Comment 9 Daniel Tröder univentionstaff 2016-09-27 17:41:45 CEST
OK: update
OK: fresh install
OK: code
OK: added automatic test 90_ucsschool/121_limit_squid_reloads
OK: advisory
Comment 10 Sönke Schwardt-Krummrich univentionstaff 2016-10-04 13:24:44 CEST
UCS@school 4.1 R2 v5 has been released.

http://docs.software-univention.de/changelog-ucsschool-4.1R2v5-de.html

If this error occurs again, please clone this bug.
Comment 11 Sönke Schwardt-Krummrich univentionstaff 2016-10-13 16:53:19 CEST
Added delay of 17 seconds in 14_http_proxy_basic_auth_check to catch up with delayed squid reload. The test failed on several Jenkins machines.