Univention Bugzilla – Bug 41687
univention-directory-notifier: Cleanup code - sub-standard code quality - potential network deadlock
Last modified: 2023-03-03 16:57:46 CET
1. The code uses pthread_*() while the code is single-threaded. Remove! 2. sem.c implements semaphores, which are un-needed. 3. debian/univention-ldap-notifier.postinst is unused 4. Duplicate code in notifier_cache_add() 5. Duplicate code in network_client_add() 6. Dynamic string allocation/concatenation without bound checking in network_client_all_write() 7. Potential write stall in network_client_all_write() (Ticket #2016053021000286?) 8. Unused notify_entry_reverse() 9. Commented out code in notify_replog_change_callback() - maybe remove "replog" code altogether? 10. Strange white-space 11. Problematic use of select() in network_client_main_loop(): can monitor FD_SETSIZE=1024 max printf '#include <sys/select.h>\nFD_SETSIZE' | cpp - | tail -n1 12. Inefficient network_client_main_loop(): Swap client<->fd for-loop 13. Replace deprecated "DNOTIFY" interface by inotify() - this also allows to get rid of the complicated (and race-prone) signal-handling!
This issue has been filled against UCS 4.1. The maintenance with bug and security fixes for UCS 4.1 has ended on 5st of April 2018. Customers still on UCS 4.1 are encouraged to update to UCS 4.3. Please contact your partner or Univention for any questions. If this issue still occurs in newer UCS versions, please use "Clone this bug" or simply reopen the issue. In this case please provide detailed information on how this issue is affecting you.
14. Denial of service by filling log file on server with error message when invalid packets are received. 15. Multiple memory leaks: dn_string, tmp=strdup(entry->dn) 16. SEGV in index-dump when called without (required) argument. 17. Ignores errors when parsing integers: strto[u]l[l](), atoi(), sscanf() 18. notify_transaction_get_last_notify_id() is inefficient: it reads an increasing number of bytes from the end of the file. 19. Dead code: notify_entry_reverse() notifier_cache_get_max_id() notifier_cache_get_min_id() index_invalidate() network_client_get_next_id() network_client_get_msg_id() network_client_waiting() 20. Dead declarations: data_on_connection_old() network_client_write() notify_entry_dump() notify_id_get_next() strndup() 21. client->version is not initialized: the next connecting client can skip sending "Version: 2" and will get the version from the client, which last disconnected. (or some other byte from the process's heap memory). # python -c 'import sys,socket;a=(sys.argv[1],6669);s=socket.create_connection(a,3.0);s.sendall("Version: 2\nCapabilities: \n\n");s.shutdown(socket.SHUT_WR);s.recv(1024);s.close();s=socket.create_connection(a,3.0);s.sendall("Capabilities: \n\n");print(s.recv(1024))' ::1 Version: 2 Capabilities: 22. TCP is a *stream* protocol, so message boundaries are *not* honored. It fails on fragmented packets. 23. Repeated index file opening by repeatedly requesting entries not in the LRU memory cache consisting of only 1000 entries by default. 24. EOF handling is wrong.
(In reply to Philipp Hahn from comment #0) > 7. Potential write stall in network_client_all_write() (Ticket > #2016053021000286?) → Bug #48466 > 11. Problematic use of select() in network_client_main_loop(): can monitor > FD_SETSIZE=1024 max > printf '#include <sys/select.h>\nFD_SETSIZE' | cpp - | tail -n1 → Bug #48467 (In reply to Philipp Hahn from comment #2) > 14. Denial of service by filling log file on server with error message when > invalid packets are received. → Bug #48468
Program received signal SIGSEGV, Segmentation fault. _IO_fgets (buf=0xbfffe43c "", n=4096, fp=0x0) at iofgets.c:47 (gdb) bt #0 _IO_fgets (buf=0xbfffe43c "", n=4096, fp=0x0) at iofgets.c:47 #1 0x00405f14 in notify_replog_change_callback (sig=0, si=0x0, data=0x0) at notify.c:793 #2 0x00406b20 in main (argc=4, argv=0xbffff554) at univention-directory-notifier.c:287 > 776 »···if ( ( file = fopen_lock ( FILE_NAME_IRF, "r+", &(l_file) ) ) == NULL ) { > 777 »···»···univention_debug(UV_DEBUG_TRANSFILE, UV_DEBUG_ERROR, "Could not open %s\n",FILE_NAME_IRF); > 778 »···} printing an error message, but continuing anyway - missing "return"
25. the implementation uses the high-level f*() functions to process files: they do internal buffering which leads to fread() always reading 4KiB instead of only the 9 bytes required for the index. Best would be to use mmap()
26. very long start-up delay if index is not up-to-date: 1 (gdb) where 2 #0 notify_transcation_get_one_dn (last_known_id=50266903) at notify.c:602 3 #1 0x0000556e8dcd965c in notifier_cache_init (max_id=50267710) at cache.c:84 4 #2 0x0000556e8dcdea70 in main (argc=7, argv=0x7ffcfbf75e18) at univention-directory-notifier.c:271 src/cache.c > 66 »···int count = 0; > 81 »···for ( i=max_id - (notifier_cache_size-1); i <= max_id; i++) { "max_id" is the last seen transaction ID "notifier_cache_size" is 1000 by default (via UCR) If less then 1000 transactions exist, "i" is negative and is passed to > 84 »···»···buffer=notify_transcation_get_one_dn ( i ); as it does NOT find a negative transaction inside the case, the code walks over ALL transactions sequentially to re-build the index. If the list is large, this takes a very long time. It it does not find the requested transaction, the get1dn() returns NULL and init() aborts early with the in-memory cache not being initialized as max_filled=0 remains: > 85 »···»···if ( buffer == NULL ) { > 86 »···»···»···max_filled=count; > 88 »···»···»···return 1; > 89 »···»···} This is not exactly the scenario which happend with ticket 2019032821000494, as there we has >50e6 transactions, but there that loop seemed to be repeated multiple times without ever becoming responsive to queries over the network. It might happen if the transaction file is corrupt and misses some transactions, leading to holes. If one of that holes is withing the last X transactions the scenario from above is triggered as get1dn() then returns NULL for that missing transaction, leading to all lines getting parsed again. Patch (which addresses this partly) is available as part of <https://git.knut.univention.de/univention/ucs/commit/3241f37d45868eb5ed95772b7b32dad1f120256b>
This is a "change the world" bug report - @phahn, can you split it in several bugs? a) one for code cleanups b) potential issues - risks, but not (yet) reproducable or seen in the wild c) individual bugs for known issues - reproducable deadlocks or other problems
(In reply to Ingo Steuwer from comment #10) > This is a "change the world" bug report - @phahn, can you split it in > several bugs? > > a) one for code cleanups > > b) potential issues - risks, but not (yet) reproducable or seen in the wild > > c) individual bugs for known issues - reproducable deadlocks or other > problems The security issues were already separated, see "Blocks" The rest is just bad/fragile/over-complicated/ignorant code. Please just fix'em all with pair-programming instead of creating another management overhead for discussion/assigning/documentation/...
Happened again: - last_id was -1 - <TransID> war in listener/listener.priv - propagated as 0 to notifier/transactions Fixed by: - systemdctl stop slapd univention-directory-{listener,notifier} - Moved 0-TIDs from notifier/transactions back to listener/listener - Moved <TransID>-TIDs from listener/listener.priv back to listener/listener - Re-numbered listener/listener - Wrote last used TID to last_id - systemdctl start slapd univention-directory-{listener,notifier}
(In reply to Philipp Hahn from comment #9) > 26. very long start-up delay if index is not up-to-date: > 1 (gdb) where > 2 #0 notify_transcation_get_one_dn (last_known_id=50266903) at > notify.c:602 > 3 #1 0x0000556e8dcd965c in notifier_cache_init (max_id=50267710) at > cache.c:84 > 4 #2 0x0000556e8dcdea70 in main (argc=7, argv=0x7ffcfbf75e18) at > univention-directory-notifier.c:271 > > src/cache.c > > 66 »···int count = 0; > > 81 »···for ( i=max_id - (notifier_cache_size-1); i <= max_id; i++) { > > "max_id" is the last seen transaction ID > "notifier_cache_size" is 1000 by default (via UCR) > > If less then 1000 transactions exist, "i" is negative and is passed to > > > 84 »···»···buffer=notify_transcation_get_one_dn ( i ); > > as it does NOT find a negative transaction inside the case, the code walks > over ALL transactions sequentially to re-build the index. If the list is > large, this takes a very long time. It it does not find the requested > transaction, the get1dn() returns NULL and init() aborts early with the > in-memory cache not being initialized as max_filled=0 remains: Bug #51911 tried to fix this via bb2dc8fa802 and ec5d2e22505 but introduced a new regression Bug #54203, where the cache is not initialized at all if less then notifier_cache_size:=1000 entries exist. The cause for this is similar to the same issues as in "24. EOF handling is wrong." (git:6b3c2bc8df5bb5e6c19faff13cbc9a3a68a3bc56) where "unsigned long" is compared to "signed int": > int notifier_cache_init ( unsigned long max_id) { > int i; … > for ( i=MAX(max_id - (notifier_cache_size-1), 1); i <= max_id; i++) { In C99, integer promotion is clearly defined in following rule (6.3.1.1): > Then, if either operand is unsigned long the other shall be converted to unsigned long. `i: int = -552` → `unsigned int = 18446744073709551064` <= `max_id: unsigned long = 1000` → FALSE
Please allow logging to STDERR if "-F" is given twice: diff --git management/univention-directory-notifier/src/univention-directory-notifier.c management/univention-directory-notifier/src/univention-directory-notifier.c index 2457ce1d1e..b4ce8e0dd6 100644 --- management/univention-directory-notifier/src/univention-directory-notifier.c +++ management/univention-directory-notifier/src/univention-directory-notifier.c @@ -168,7 +168,7 @@ int main(int argc, char* argv[]) switch (c) { case 'F': - foreground = 1; + foreground++; break; case 'd': debug = atoi(optarg); @@ -203,7 +203,7 @@ int main(int argc, char* argv[]) daemon(1,1); } - univention_debug_init("/var/log/univention/notifier.log",1,1); + univention_debug_init(foreground < 2 ? "/var/log/univention/notifier.log" : "stderr", 1, 1); univention_debug_set_level(UV_DEBUG_TRANSFILE, debug); if ( creating_pidfile("/var/run/udsnotifier.pid") != 0 ) @@ -240,5 +240,3 @@ int main(int argc, char* argv[]) return 0; } - -
27: valgrind ==11224== Syscall param socketcall.bind(my_addr.sin6_flowinfo) points to uninitialised byte(s) ==11224== at 0x49A2497: bind (syscall-template.S:78) ==11224== by 0x10BEDA: network_create_socket (network.c:84) ==11224== by 0x10C44E: network_client_init (network.c:280) ==11224== by 0x10ED83: main (univention-directory-notifier.c:254) ==11224== Address 0x1fff000154 is on thread 1's stack ==11224== in frame #1, created by network_create_socket (network.c:70) ==11224== ==11224== Syscall param socketcall.bind(my_addr.sin6_scope_id) points to uninitialised byte(s) ==11224== at 0x49A2497: bind (syscall-template.S:78) ==11224== by 0x10BEDA: network_create_socket (network.c:84) ==11224== by 0x10C44E: network_client_init (network.c:280) ==11224== by 0x10ED83: main (univention-directory-notifier.c:254) ==11224== Address 0x1fff000168 is on thread 1's stack ==11224== in frame #1, created by network_create_socket (network.c:70) ==11224==
(In reply to Philipp Hahn from comment #2) > 15. Multiple memory leaks: dn_string, tmp=strdup(entry->dn) valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./univention-directory-notifier -FF -d 3 -v 3 ==11238== HEAP SUMMARY: ==11238== in use at exit: 46,137 bytes in 377 blocks ==11238== total heap usage: 2,612 allocs, 2,235 frees, 3,734,732 bytes allocated ==11238== ==11238== 7 bytes in 1 blocks are still reachable in loss record 1 of 6 ==11238== at 0x483577F: malloc (vg_replace_malloc.c:299) ==11238== by 0x492FDB9: strdup (strdup.c:42) ==11238== by 0x485025E: univention_debug_init (in /usr/lib/x86_64-linux-gnu/libuniventiondebug.so.1.0.0) ==11238== by 0x10EC8D: main (univention-directory-notifier.c:231) missing univention_debug_exit() ==11238== 48 bytes in 1 blocks are still reachable in loss record 2 of 6 ==11238== at 0x483577F: malloc (vg_replace_malloc.c:299) ==11238== by 0x10C032: network_client_add (network.c:115) ==11238== by 0x10C46D: network_client_init (network.c:281) ==11238== by 0x10ED83: main (univention-directory-notifier.c:254) missing network_client_del(server_socketfd_listener) + close(server_socketfd_listener) ==11238== 48 bytes in 1 blocks are still reachable in loss record 3 of 6 ==11238== at 0x483577F: malloc (vg_replace_malloc.c:299) ==11238== by 0x10C09D: network_client_add (network.c:130) ==11238== by 0x10C432: new_connection (network.c:272) ==11238== by 0x10C7BA: network_client_main_loop (network.c:346) ==11238== by 0x10EDC9: main (univention-directory-notifier.c:262) missing network_client_del(…) ==11238== 67 bytes in 3 blocks are definitely lost in loss record 4 of 6 ==11238== at 0x483577F: malloc (vg_replace_malloc.c:299) ==11238== by 0x10B1E4: data_on_connection (callback.c:128) ==11238== by 0x10C7BA: network_client_main_loop (network.c:346) ==11238== by 0x10EDC9: main (univention-directory-notifier.c:262) REMOTE EXPLOITABLE MEMORY LEAK; missing free(network_packet) on all exit paths from data_on_connection() ==11238== 21,967 bytes in 370 blocks are still reachable in loss record 5 of 6 ==11238== at 0x483577F: malloc (vg_replace_malloc.c:299) ==11238== by 0x10AA84: notifier_cache_init (cache.c:81) ==11238== by 0x10ED5E: main (univention-directory-notifier.c:251) missing free(cache[…].dn) ==11238== 24,000 bytes in 1 blocks are still reachable in loss record 6 of 6 ==11238== at 0x483577F: malloc (vg_replace_malloc.c:299) ==11238== by 0x10A8B0: notifier_cache_init (cache.c:57) ==11238== by 0x10ED5E: main (univention-directory-notifier.c:251) missing free(cache); ==11238== LEAK SUMMARY: ==11238== definitely lost: 67 bytes in 3 blocks ==11238== indirectly lost: 0 bytes in 0 blocks ==11238== possibly lost: 0 bytes in 0 blocks ==11238== still reachable: 46,070 bytes in 374 blocks ==11238== suppressed: 0 bytes in 0 blocks
(In reply to Philipp Hahn from comment #12) > Happened again: 4.4-8 > - last_id was -1 > - <TransID> war in listener/listener.priv > - propagated as 0 to notifier/transactions > > Fixed by: > - systemdctl stop slapd univention-directory-{listener,notifier} > - Moved 0-TIDs from notifier/transactions back to listener/listener cd /var/lib/univention-ldap sed -e '/^0 /{p;d}' -i notify/transactions > listener/tmp > - Moved <TransID>-TIDs from listener/listener.priv back to listener/listener cat listener/listener.priv listener/listener >>listener/tmp truncate -s 0 listener/listener.priv listener/listener > - Re-numbered listener/listener off="$(awk 'END{print $1}' notify/transactions)" awk -v off="$off" '{print NR+off,$2,$3}' <listener/tmp >listener/listener.priv rm -f listener/tmp > - Wrote last used TID to last_id awk 'END{print $1}' listener/listener.priv >./last_id > - systemdctl start slapd univention-directory-{listener,notifier}
Again 2023030221000355