Bug 46696 - [4.3] Locking errors in netlogon listener module
[4.3] Locking errors in netlogon listener module
Status: CLOSED FIXED
Product: UCS@school
Classification: Unclassified
Component: Netlogon scripts
UCS@school 4.2
Other Linux
: P5 normal (vote)
: UCS@school 4.3 v3
Assigned To: Daniel Tröder
Sönke Schwardt-Krummrich
:
Depends on:
Blocks: 46778
  Show dependency treegraph
 
Reported: 2018-03-19 13:49 CET by Sönke Schwardt-Krummrich
Modified: 2018-05-02 17:59 CEST (History)
1 user (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.171
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support: Yes
Flags outvoted (downgraded) after PO Review:
Ticket number: 2018031921001501
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sönke Schwardt-Krummrich univentionstaff 2018-03-19 13:49:04 CET
16.03.18 22:45:26.334  LISTENER    ( INFO    ) : got 1326 bytes for cn=schema admins,cn=groups,dc=schule,dc=example,dc=de
16.03.18 22:45:26.334  LISTENER    ( INFO    ) : running handlers [ucs-school-user-logonscript] for cn=Schema Admins,cn=groups,dc=schule,dc=example,dc=de
UNIVENTION_DEBUG_BEGIN  : uldap.__open host=dcmaster.schule.example.de port=7389 base=dc=schule,dc=example,dc=de
16.03.18 22:45:26.342  LDAP        ( INFO    ) : establishing new connection with retry_max=11
16.03.18 22:45:26.352  LDAP        ( INFO    ) : bind binddn=cn=dcgsmitte,cn=dc,cn=server,cn=computers,ou=gsmitte,dc=schule,dc=example,dc=de
UNIVENTION_DEBUG_END    : uldap.__open host=dcmaster.schule.example.de port=7389 base=dc=schule,dc=example,dc=de
Traceback (most recent call last):
  File "/usr/lib/univention-directory-listener/system/ucs-school-user-logonscript.py", line 173, in handler
    user_queue = SqliteQueue(logger=Log)
  File "/usr/lib/pymodules/python2.7/ucsschool/netlogon/__init__.py", line 75, in __init__
    self.setup_database()
  File "/usr/lib/pymodules/python2.7/ucsschool/netlogon/__init__.py", line 103, in setup_database
    self.cursor.execute(u'CREATE TABLE IF NOT EXISTS user_queue (id INTEGER PRIMARY KEY AUTOINCREMENT, userdn TEXT, username TEXT)')
sqlite3.OperationalError: database is locked
16.03.18 22:45:32.379  LISTENER    ( WARN    ) : handler: ucs-school-user-logonscript (failed)
Comment 1 Daniel Tröder univentionstaff 2018-04-04 17:48:01 CEST
A context manager was introduced that closes the sqlite3 DB connection after each access. Accidentally keeping the DB locked cannot happen anymore.
The timeout to create a connection to a locked database has been raise to 30s.

[4.3 e9d7b43a] Bug #46696: prevent locked DB by closing it after every use
[4.3 a16e997c] Bug #46696: changelog
[4.3 01cc2301] Bug #46696: advisory

ucs-school-netlogon-user-logonscripts (14.0.0-2)
Comment 2 Daniel Tröder univentionstaff 2018-04-06 16:38:46 CEST
Performance measurements show no decline in performance with the new code.

The timings show that the bottleneck is the listener (module), as the daemon finishes generating scripts shortly after the listener finishes, but goes to sleep a few times in between.

==========================================================================

root@m65:~# dpkg -l ucs-school-netlogon-user-logonscripts 
ii  ucs-school-netlogon-user-l 14.0.0-1A~4.3.0.20 all

root@m65:~# udm users/user list | egrep '^DN:' | wc -l
2008
root@m65:~# udm groups/group list | egrep '^DN:' | wc -l
267
root@m65:~# udm shares/share list | egrep '^DN:' | wc -l
93

root@m65:~# univention-directory-listener-ctrl resync ucs-school-user-logonscript

root@m65:~# egrep '(^2018-04-06 13:5|^2018-04-06 14:0).*updating logon script for' /var/log/univention/ucs-school-user-logonscript-daemon.log | wc -l
5865

/var/log/univention/listener.log
--------------------------------------------------------------------------------
06.04.18 13:58:01.010  LISTENER    ( WARN    ) : initializing module ucs-school-user-logonscript
06.04.18 13:58:01.010  LISTENER    ( WARN    ) : ucs-school-user-logonscript: Stopping logon script generator daemon...
06.04.18 13:58:01.079  LISTENER    ( WARN    ) : ucs-school-user-logonscript: Deleting all *.vbs scripts in ['/var/lib/samba/netlogon/user']...
06.04.18 13:58:01.204  LISTENER    ( WARN    ) : ucs-school-user-logonscript: Purging SQLite DB...
06.04.18 13:58:01.228  LISTENER    ( WARN    ) : ucs-school-user-logonscript: Starting logon script generator daemon...
...
06.04.18 14:00:31.538  LISTENER    ( WARN    ) : finished initializing module ucs-school-user-logonscript with rv=0
--------------------------------------------------------------------------------

/var/log/univention/ucs-school-user-logonscript-daemon.log
--------------------------------------------------------------------------------
2018-04-06 13:58:01,038 [WARNING]  signal received - shutting down gracefully
2018-04-06 13:58:01,038 [INFO]  Shutdown as requested
2018-04-06 13:58:01,368 [INFO]  INIT
2018-04-06 13:58:01,426 [INFO]  got global links {'Marktplatz': {'letter': 'M', 'server': 'm65'}}
2018-04-06 13:58:02,267 [INFO]  signal received - reduced wait time by 172.2 seconds
...
2018-04-06 13:59:22,307 [INFO]  all items processed... sleeping...
2018-04-06 13:59:34,168 [INFO]  all items processed... sleeping...
2018-04-06 13:59:45,992 [INFO]  all items processed... sleeping...
2018-04-06 13:59:57,226 [INFO]  all items processed... sleeping...
2018-04-06 14:00:15,475 [INFO]  all items processed... sleeping...
2018-04-06 14:00:57,250 [INFO]  all items processed... sleeping...
--------------------------------------------------------------------------------

2008 user / 2m56s (176s) → 11.41 user/s


root@m65:~# dpkg -i ucs-school-netlogon-user-logonscripts_14.0.0-2_all.deb 

root@m65:~# univention-directory-listener-ctrl resync ucs-school-user-logonscript

root@m65:~# egrep '^2018-04-06 14:2.*updating logon script for' /var/log/univention/ucs-school-user-logonscript-daemon.log | wc -l
5804

/var/log/univention/listener.log
--------------------------------------------------------------------------------
06.04.18 14:20:12.265  LISTENER    ( WARN    ) : initializing module ucs-school-user-logonscript
06.04.18 14:20:12.265  LISTENER    ( WARN    ) : ucs-school-user-logonscript: Stopping logon script generator daemon...
06.04.18 14:20:12.324  LISTENER    ( WARN    ) : ucs-school-user-logonscript: Deleting all *.vbs scripts in ['/var/lib/samba/netlogon/user']...
06.04.18 14:20:12.442  LISTENER    ( WARN    ) : ucs-school-user-logonscript: Purging SQLite DB...
06.04.18 14:20:12.466  LISTENER    ( WARN    ) : ucs-school-user-logonscript: Starting logon script generator daemon...
...
06.04.18 14:22:36.945  LISTENER    ( WARN    ) : finished initializing module ucs-school-user-logonscript with rv=0
--------------------------------------------------------------------------------

/var/log/univention/ucs-school-user-logonscript-daemon.log
--------------------------------------------------------------------------------
2018-04-06 14:20:12,286 [WARNING]  signal received - shutting down gracefully
2018-04-06 14:20:12,287 [INFO]  Shutdown as requested
2018-04-06 14:20:12,614 [INFO]  INIT
2018-04-06 14:20:12,669 [INFO]  got global links {'Marktplatz': {'letter': 'M', 'server': 'm65'}}
2018-04-06 14:20:13,250 [INFO]  signal received - reduced wait time by 172.4 seconds
...
2018-04-06 14:21:31,169 [INFO]  all items processed... sleeping...
2018-04-06 14:21:42,022 [INFO]  all items processed... sleeping...
2018-04-06 14:21:53,902 [INFO]  all items processed... sleeping...
2018-04-06 14:22:06,369 [INFO]  all items processed... sleeping...
2018-04-06 14:22:20,737 [INFO]  all items processed... sleeping...
2018-04-06 14:23:06,480 [INFO]  all items processed... sleeping...
--------------------------------------------------------------------------------

2008 user / 2m54s (174s) → 11.54 user/s

=======================================================================
=======================================================================


I created code with instrumentation to measure time spent on code paths. The code was commited to branch dtroeder/46696_netlogon_benchmark.

The timing of the listener show that 90% of the time is used to initialize the LDAP connection.

Regarding the daemon: Below (at the end) "run() end" can be ignored - it contains sleep() time.
Around 30% time is spend calling sqlite and about 70% time is spend creating the logon script. So I guess that's OK.

I total time the daemon needs 1/5 the time to create a script, that the listener module need to issue the order. If the startup time can be reduced, that would boost things...

=======================================================================
=======================================================================

root@m65:~# ucr set ucsschool/userlogon/benchmark=yes
Create ucsschool/userlogon/benchmark
root@m65:~# service ucs-school-netlogon-user-logonscripts restart
root@m65:~# service univention-directory-listener restart
root@m65:~# univention-directory-listener-ctrl resync ucs-school-user-logonscript

/var/log/univention/listener.log
--------------------------------------------------------------------------------
06.04.18 14:48:43.484  LISTENER    ( WARN    ) : initializing
...
06.04.18 14:49:06.410  LISTENER    ( PROCESS ) : ucs-school-user-logonscript: Timer one:
handler start____: 0.000000
handler init_____: 0.055096
handle_user start: 0.000064
handle_user end__: 0.005962
handler done_____: 0.000150
...
06.04.18 14:51:05.908  LISTENER    ( WARN    ) : finished initializing module ucs-school-user-logonscript with rv=0
--------------------------------------------------------------------------------

/var/log/univention/ucs-school-user-logonscript-daemon.log
--------------------------------------------------------------------------------
2018-04-06 14:48:43,505 [INFO]  Shutdown as
2018-04-06 14:48:43,831 [INFO]  INIT
...
2018-04-06 14:49:06,089 [INFO]  updating logon script for uid=pankraz.palm,cn=schueler,cn=users,ou=Schule2,dc=uni,dc=dtr
2018-04-06 14:49:06,096 [INFO]  Timer one:
run() start_____________________________________: 0.000000
run() self.user_queue.query_next_user()_________: 0.000399
run() self.user_queue.remove(user_dn)___________: 0.004568
run() self.handle_user_change(user_dn, username): 0.010065
run() after queue is not empty__________________: 0.000014
run() end_______________________________________: 0.000010
...
2018-04-06 14:51:36,314 [INFO]  all items processed... sleeping...
...
2018-04-06 14:54:52,872 [WARNING]  signal received - shutting down gracefully
2018-04-06 14:54:52,873 [INFO]  Shutdown as requested
2018-04-06 14:54:52,874 [INFO]  Timer total:
run() start_____________________________________: 0.000000
run() self.user_queue.query_next_user()_________: 6.864963
run() after queue is not empty__________________: 0.269353
run() end_______________________________________: 239.388254
run() self.user_queue.remove(user_dn)___________: 22.287977
run() self.handle_user_change(user_dn, username): 97.183153
run() self.user_queue.add([(user_dn, username)]): 0.225897
--------------------------------------------------------------------------------

2m53s with massiv log output for benchmarking
2008 user / 2m53s (173s) → 11.60 user/s
Comment 3 Sönke Schwardt-Krummrich univentionstaff 2018-04-20 15:25:05 CEST
24.03.18 20:52:22.068  LISTENER    ( WARN    ) : handler: ucs-school-user-logonscript (failed)
Traceback (most recent call last):
  File "/usr/lib/univention-directory-listener/system/ucs-school-user-logonscript.py", line 231, in handler
    user_queue = SqliteQueue(logger=Log)
  File "/usr/lib/pymodules/python2.7/ucsschool/netlogon/__init__.py", line 104, in __init__
    self.setup_database()
  File "/usr/lib/pymodules/python2.7/ucsschool/netlogon/__init__.py", line 117, in setup_database
    with Cursor(self.logger, self.filename) as cursor:
TypeError: __init__() takes exactly 2 arguments (3 given)
Comment 4 Daniel Tröder univentionstaff 2018-04-23 14:29:34 CEST
[4.3] fd9cc110 Bug #46696: remove excess argument
[4.3] 8e6349ef Bug #46696: changelog
[4.3] b0fab9dc Bug #46696: advisory update

ucs-school-netlogon-user-logonscripts (14.0.0-4)
Comment 5 Sönke Schwardt-Krummrich univentionstaff 2018-04-25 15:48:33 CEST
OK: code change
OK: functional change
OK: tests
OK: changelog entry
OK: advisory
OK: package built
Comment 6 Sönke Schwardt-Krummrich univentionstaff 2018-05-02 17:59:04 CEST
UCS@school 4.3 v3 has been released.

https://docs.software-univention.de/changelog-ucsschool-4.3v3-de.html

If this error occurs again, please clone this bug.