Bug 41261 - Listener sometimes skips transactions during a listener restart
Listener sometimes skips transactions during a listener restart
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Listener (univention-directory-listener)
UCS 3.2
Other Linux
: P5 normal (vote)
: UCS 4.1-2-errata
Assigned To: Philipp Hahn
Arvid Requate
:
Depends on:
Blocks: 31573 41486 41657
  Show dependency treegraph
 
Reported: 2016-05-11 17:46 CEST by Ingo Sieverdingbeck
Modified: 2017-04-24 11:45 CEST (History)
4 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 7: Crash: Bug causes crash or data loss
Who will be affected by this bug?: 5: Will affect all installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.600
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
listener.log from backup (32.96 KB, text/x-log)
2016-05-11 17:46 CEST, Ingo Sieverdingbeck
Details
transaction from backup (296 bytes, text/plain)
2016-05-11 17:47 CEST, Ingo Sieverdingbeck
Details
suggestions.patch (2.00 KB, patch)
2016-07-13 13:14 CEST, Arvid Requate
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ingo Sieverdingbeck univentionstaff 2016-05-11 17:46:29 CEST
Created attachment 7653 [details]
listener.log from backup

If the listener is restarted is is possible that some transaction id are skipped and the listener continues at a later transaction.

To provoke this behaviour I restarted the listener once per minute on a DC Backup while running a ldapadd on the DC Master on a LDIF file with ~30000 objects.

Please find attached parts of the listener.log and of the transaction file of the backup instance. If you need more logs I can add them here.
Comment 1 Ingo Sieverdingbeck univentionstaff 2016-05-11 17:47:05 CEST
Created attachment 7654 [details]
transaction from backup
Comment 2 Philipp Hahn univentionstaff 2016-06-07 18:03:48 CEST
This can be reprduced like this:
 @master:
  ./create-32k-users-in-groups
 @slave
  while sleep 5; do /etc/init.d/univention-directory-listener restart; done &
  tail -f /var/lib/univention-ldap/listener/listener | while read tid dn; do echo -n " $tid";  if [ -n "${lid:-}" ] && [ $((lid + 1)) -ne "$tid" ]; then echo;echo "$lid $tid $dn";echo;fi;lid=$tid; done

The bugs is caused by the Listener being killed while /var/lib/univention-directory-listener/notifier_id is written. It becomes empty. After a restart the UDL then starts with the current transaction ID skipping all transactions in between.

r69907 | Bug #41261 UDL: Switch to cache master_key
r69906 | Bug #41261 UDL: Fix free(NULL)
r69905 | Bug #41261 UDL: Write notifier ID atomically
r69901 | Bug #41261 UDL: Remove old migration code
r69900 | Bug #41261 UDL: Init logging earlier


Package: univention-directory-listener
Version: 10.0.0-11.315.201606071717
Branch: ucs_4.1-0
Scope: errata4.1-2

r69910 | Bug #22383,Bug #30227,Bug #30263,Bug #34324,Bug #34507,Bug #34738,Bug #3490,Bug #38696,Bug #39509,Bug #40600,Bug #41261: UDL YAML
 univention-directory-listener.yaml


Package: univention-directory-listener
Version: 8.0.4-2.316.201606071758
Branch: ucs_3.2-0
Scope: errata3.2-8

r69920 | Bug #41261 UDL: Write notifier ID atomically YAML
 univention-directory-listener.yaml
Comment 3 Philipp Hahn univentionstaff 2016-06-08 18:00:12 CEST
r69981 | Bug #41261 test: Fix UDL schema test
 Tests did not clean up after them, especially in error cases
r69980 | Bug #41261 UDL: Restore schema/id writing
r69979 | Bug #41261 UDL: Use only one master cache entry
 Fix regression <http://jenkins.knut.univention.de:8080/job/UCS-4.1/job/UCS-4.1-2/job/AutotestJoin/lastCompletedBuild/SambaVersion=s3,Systemrolle=slave/testReport/10_ldap/74schema_update/test/>
r69978 | Bug #39509 UDL: allow to query more info
 Needed for debugging, udl/src/demo.c would also help.

Package: univention-directory-listener
Version: 10.0.0-12.319.201606081733
Branch: ucs_4.1-0
Scope: errata4.1-2

r69982 | Bug #41261 UDL: regression fix YAML
 univention-directory-listener.yaml
Comment 4 Stefan Gohmann univentionstaff 2016-06-09 05:57:30 CEST
Please have a look at the Jenkins tests, for example http://jenkins.knut.univention.de:8080/job/UCS-4.1/job/UCS-4.1-2/job/AutotestJoin/SambaVersion=s3,Systemrolle=master/41/console

--------------------------------------------------------------------------------
08.06.16 18:20:13.801  DEBUG_INIT
UNIVENTION_DEBUG_BEGIN  : uldap.__open host=master090.autotest090.local port=7389 base=dc=autotest090,dc=local
UNIVENTION_DEBUG_END    : uldap.__open host=master090.autotest090.local port=7389 base=dc=autotest090,dc=local
08.06.16 18:20:14.805  LISTENER    ( ERROR   ) : non consecutive move: 1105:r:uid=Administrator,cn=users,dc=autotest090,dc=local << 1105:r:uid=Administrator,cn=users,dc=autotest090,dc=local
08.06.16 18:20:14.805  LISTENER    ( ERROR   ) : change_update_dn: Operations error
08.06.16 18:20:14.805  LISTENER    ( ERROR   ) : listener: 1
08.06.16 18:20:25.255  DEBUG_INIT
--------------------------------------------------------------------------------

This also breaks the UCS@school 4.1 R2 Jenkins tests.
Comment 5 Philipp Hahn univentionstaff 2016-06-09 09:24:06 CEST
(In reply to Stefan Gohmann from comment #4)
> 08.06.16 18:20:14.805  LISTENER    ( ERROR   ) : non consecutive move:
> 1105:r:uid=Administrator,cn=users,dc=autotest090,dc=local <<
> 1105:r:uid=Administrator,cn=users,dc=autotest090,dc=local

Note that the ID did not increment, so same transaction was processed twice.

r69998 | Bug #41261 UDL: Fix notifier ID not incrementing

Package: univention-directory-listener
Version: 10.0.0-13.320.201606090821
Branch: ucs_4.1-0
Scope: errata4.1-2

r69999 | Bug #41261 UDL: Fix notifier ID not incrementing YAML
 univention-directory-listener.yaml

QA: Survives udl/doc.34355/ expect test3[01]* which for Bug #40600 need
 ln -s /bin/true /usr/local/bin/univention-directory-notifier
Comment 6 Arvid Requate univentionstaff 2016-07-12 21:16:15 CEST
Code review: Ok
Advisory: Ok (typo fixed)

But I cannot confirm that this fixes the issue.
The test case noted in Comment 2 still reproduces the problem:

Output of the test loop on the DC Backup:
===========================================================================
 2488tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2489tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2490tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2491tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2492tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2493tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2494tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2495tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2496tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2497tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2501
2497 2501 cn=nscd005e,cn=uid,cn=temporary,cn=univention,dc=ar41i1,dc=qa a

tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2502tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2503tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2505
2503 2505 uid=nscd005e,cn=users,dc=ar41i1,dc=qa a

tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2506tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2507tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2508tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2509tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
 2510tail: /var/lib/univention-ldap/listener/listener: Datei abgeschnitten
===========================================================================

The transaction log on the master shows:
===========================================================================
2497 cn=nscd005,cn=groups,dc=ar41i1,dc=qa m
2498 cn=Domain Users,cn=groups,dc=ar41i1,dc=qa m
2499 uid=nscd005d,cn=users,dc=ar41i1,dc=qa m
2500 uid=nscd005d,cn=users,dc=ar41i1,dc=qa m
2501 cn=nscd005e,cn=uid,cn=temporary,cn=univention,dc=ar41i1,dc=qa a
2502 cn=2104,cn=uidNumber,cn=temporary,cn=univention,dc=ar41i1,dc=qa a
2503 cn=2104,cn=gidNumber,cn=temporary,cn=univention,dc=ar41i1,dc=qa a
2504 cn=2104,cn=gidNumber,cn=temporary,cn=univention,dc=ar41i1,dc=qa d
2505 uid=nscd005e,cn=users,dc=ar41i1,dc=qa a
2506 cn=nscd005e,cn=uid,cn=temporary,cn=univention,dc=ar41i1,dc=qa d
2507 cn=uidNumber,cn=temporary,cn=univention,dc=ar41i1,dc=qa m
2508 cn=2104,cn=uidNumber,cn=temporary,cn=univention,dc=ar41i1,dc=qa d
2509 cn=nscd00,cn=groups,dc=ar41i1,dc=qa m
2510 cn=nscd005,cn=groups,dc=ar41i1,dc=qa m
===========================================================================
Comment 7 Arvid Requate univentionstaff 2016-07-13 13:14:17 CEST
Created attachment 7798 [details]
suggestions.patch

With the attached patch I can confirm that the new version actually fixes the issue. The test case of Bug 41261 Comment 2 is simply bogus. With the attached patch and the following procedure I was able to reproduce the original problem and confirm that the new version fixes the issue:

ucr set listener/debug/level='3'
while sleep 10; do /etc/init.d/univention-directory-listener restart; done &
tail -f /var/log/univention/listener.log | while read line; do tid=$(sed -n 's/.* Last Notifier ID: //p' <<<"$line"); if [ -n "$tid" ]; then if [ -n "${lid:-}" ] && [ $((lid + 1)) -ne "$tid" ] && [ "$lid" -ne "$tid" ] ; then echo;echo "$lid $tid $dn";echo;fi;lid=$tid; fi; done

The attached patch contains some additional minor suggestions.
Comment 8 Arvid Requate univentionstaff 2016-07-13 13:15:50 CEST
> QA: Survives udl/doc.34355/ expect test3[01]* which for Bug #40600 need
 ln -s /bin/true /usr/local/bin/univention-directory-notifier

No clue what that is supposed to mean, can you elaborate?
Comment 9 Philipp Hahn univentionstaff 2016-07-18 15:10:01 CEST
(In reply to Arvid Requate from comment #8)
> > QA: Survives udl/doc.34355/ expect test3[01]* which for Bug #40600 need
>  ln -s /bin/true /usr/local/bin/univention-directory-notifier
> 
> No clue what that is supposed to mean, can you elaborate?

Since Bug #40600 UDL no longer writes /var/lib/univention-ldap/listener/listener on DC Slaves and Member-Servers, but the test expects those files to check correct working. You can fool UDL to write that file again by either installing the real UDN or by just proving a "u-d-l" executable somewhere.
Comment 10 Philipp Hahn univentionstaff 2016-07-18 17:53:27 CEST
(In reply to Arvid Requate from comment #7)
> Created attachment 7798 [details]
...
> The attached patch contains some additional minor suggestions.

Thanks, applied:
r71070 | Bug #41261 UDL: Add additional debug output for notifier ID

Package: univention-directory-listener
Version: 10.0.0-14.325.201607181745
Branch: ucs_4.1-0
Scope: errata4.1-2

r71075 | Bug #40600,Bug #41261,Bug #34324,Bug #3490 UDL: YAML
 univention-directory-listener.yaml
Comment 11 Arvid Requate univentionstaff 2016-07-18 19:25:56 CEST
Code review: Ok
Functional tests: Ok
Advisory: Ok
Comment 12 Janek Walkenhorst univentionstaff 2016-07-21 15:16:16 CEST
<http://errata.software-univention.de/ucs/4.1/215.html>