Bug 41687 - univention-directory-notifier: Cleanup code - sub-standard code quality - potential network deadlock
univention-directory-notifier: Cleanup code - sub-standard code quality - pot...
Status: REOPENED
Product: UCS
Classification: Unclassified
Component: Notifier (univention-directory-notifier)
UCS 4.4
Other Linux
: P5 normal with 2 votes (vote)
: ---
Assigned To: UCS maintainers
https://git.knut.univention.de/univen...
:
Depends on: 48466 49277
Blocks: 48467 48468 51911 53821
  Show dependency treegraph
 
Reported: 2016-06-28 15:14 CEST by Philipp Hahn
Modified: 2023-03-03 16:57 CET (History)
7 users (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number: 2019032821000494,2023030221000355
Bug group (optional): Cleanup, Debt Technical, Security
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2016-06-28 15:14:43 CEST
1. The code uses pthread_*() while the code is single-threaded. Remove!

2. sem.c implements semaphores, which are un-needed.

3. debian/univention-ldap-notifier.postinst is unused

4. Duplicate code in notifier_cache_add()

5. Duplicate code in network_client_add()

6. Dynamic string allocation/concatenation without bound checking in network_client_all_write()

7. Potential write stall in network_client_all_write() (Ticket #2016053021000286?)

8. Unused notify_entry_reverse()

9. Commented out code in notify_replog_change_callback() - maybe remove "replog" code altogether?

10. Strange white-space

11. Problematic use of select() in network_client_main_loop(): can monitor FD_SETSIZE=1024 max
 printf '#include <sys/select.h>\nFD_SETSIZE' | cpp - | tail -n1

12. Inefficient network_client_main_loop(): Swap client<->fd for-loop

13. Replace deprecated "DNOTIFY" interface by inotify() - this also allows to get rid of the complicated (and race-prone) signal-handling!
Comment 1 Stefan Gohmann univentionstaff 2019-01-03 07:23:33 CET
This issue has been filled against UCS 4.1. The maintenance with bug and security fixes for UCS 4.1 has ended on 5st of April 2018.

Customers still on UCS 4.1 are encouraged to update to UCS 4.3. Please contact
your partner or Univention for any questions.

If this issue still occurs in newer UCS versions, please use "Clone this bug" or simply reopen the issue. In this case please provide detailed information on how this issue is affecting you.
Comment 2 Philipp Hahn univentionstaff 2019-01-11 12:03:46 CET
14. Denial of service by filling log file on server with error message when invalid packets are received.

15. Multiple memory leaks: dn_string, tmp=strdup(entry->dn)

16. SEGV in index-dump when called without (required) argument.

17. Ignores errors when parsing integers: strto[u]l[l](), atoi(), sscanf()

18. notify_transaction_get_last_notify_id() is inefficient: it reads an increasing number of bytes from the end of the file.

19. Dead code:
    notify_entry_reverse()
    notifier_cache_get_max_id()
    notifier_cache_get_min_id()
    index_invalidate()
    network_client_get_next_id()
    network_client_get_msg_id()
    network_client_waiting()

20. Dead declarations:
    data_on_connection_old()
    network_client_write()
    notify_entry_dump()
    notify_id_get_next()
    strndup()

21. client->version is not initialized: the next connecting client can skip sending "Version: 2" and will get the version from the client, which last disconnected. (or some other byte from the process's heap memory).
# python -c 'import sys,socket;a=(sys.argv[1],6669);s=socket.create_connection(a,3.0);s.sendall("Version: 2\nCapabilities: \n\n");s.shutdown(socket.SHUT_WR);s.recv(1024);s.close();s=socket.create_connection(a,3.0);s.sendall("Capabilities: \n\n");print(s.recv(1024))' ::1
  Version: 2
  Capabilities: 

22. TCP is a *stream* protocol, so message boundaries are *not* honored. It fails on fragmented packets.

23. Repeated index file opening by repeatedly requesting entries not in the LRU memory cache consisting of only 1000 entries by default.

24. EOF handling is wrong.
Comment 6 Philipp Hahn univentionstaff 2019-01-14 16:16:45 CET
(In reply to Philipp Hahn from comment #0)
> 7. Potential write stall in network_client_all_write() (Ticket
> #2016053021000286?)Bug #48466

> 11. Problematic use of select() in network_client_main_loop(): can monitor
> FD_SETSIZE=1024 max
>  printf '#include <sys/select.h>\nFD_SETSIZE' | cpp - | tail -n1Bug #48467

(In reply to Philipp Hahn from comment #2)
> 14. Denial of service by filling log file on server with error message when
> invalid packets are received.Bug #48468
Comment 7 Philipp Hahn univentionstaff 2019-03-27 18:57:14 CET
Program received signal SIGSEGV, Segmentation fault.
_IO_fgets (buf=0xbfffe43c "", n=4096, fp=0x0) at iofgets.c:47
(gdb) bt
#0  _IO_fgets (buf=0xbfffe43c "", n=4096, fp=0x0) at iofgets.c:47
#1  0x00405f14 in notify_replog_change_callback (sig=0, si=0x0, data=0x0) at notify.c:793
#2  0x00406b20 in main (argc=4, argv=0xbffff554) at univention-directory-notifier.c:287

>  776 »···if ( ( file = fopen_lock ( FILE_NAME_IRF, "r+", &(l_file) ) ) == NULL ) {
>  777 »···»···univention_debug(UV_DEBUG_TRANSFILE, UV_DEBUG_ERROR, "Could not open %s\n",FILE_NAME_IRF);
>  778 »···}

printing an error message, but continuing anyway - missing "return"
Comment 8 Philipp Hahn univentionstaff 2019-03-29 11:12:26 CET
25. the implementation uses the high-level f*() functions to process files: they do internal buffering which leads to fread() always reading 4KiB instead of only the 9 bytes required for the index. Best would be to use mmap()
Comment 9 Philipp Hahn univentionstaff 2019-04-01 08:49:19 CEST
26. very long start-up delay if index is not up-to-date:
  1 (gdb) where
  2 #0  notify_transcation_get_one_dn (last_known_id=50266903) at notify.c:602
  3 #1  0x0000556e8dcd965c in notifier_cache_init (max_id=50267710) at cache.c:84
  4 #2  0x0000556e8dcdea70 in main (argc=7, argv=0x7ffcfbf75e18) at univention-directory-notifier.c:271                             

src/cache.c
>   66 »···int count = 0;
>   81 »···for ( i=max_id - (notifier_cache_size-1); i <= max_id; i++) {

"max_id" is the last seen transaction ID
"notifier_cache_size" is 1000 by default (via UCR)

If less then 1000 transactions exist, "i" is negative and is passed to

>   84 »···»···buffer=notify_transcation_get_one_dn ( i );

as it does NOT find a negative transaction inside the case, the code walks over ALL transactions sequentially to re-build the index. If the list is large, this takes a very long time. It it does not find the requested transaction, the get1dn() returns NULL and init() aborts early with the in-memory cache not being initialized as max_filled=0 remains:

>   85 »···»···if ( buffer == NULL ) {
>   86 »···»···»···max_filled=count;                                                                                                  
>   88 »···»···»···return 1;
>   89 »···»···}

This is not exactly the scenario which happend with ticket 2019032821000494, as there we has >50e6 transactions, but there that loop seemed to be repeated multiple times without ever becoming responsive to queries over the network. It might happen if the transaction file is corrupt and misses some transactions, leading to holes. If one of that holes is withing the last X transactions the scenario from above is triggered as get1dn() then returns NULL for that missing transaction, leading to all lines getting parsed again.

Patch (which addresses this partly) is available as part of <https://git.knut.univention.de/univention/ucs/commit/3241f37d45868eb5ed95772b7b32dad1f120256b>
Comment 10 Ingo Steuwer univentionstaff 2020-09-01 10:40:27 CEST
This is a "change the world" bug report - @phahn, can you split it in several bugs?

a) one for code cleanups

b) potential issues - risks, but not (yet) reproducable or seen in the wild

c) individual bugs for known issues - reproducable deadlocks or other problems
Comment 11 Philipp Hahn univentionstaff 2020-09-01 11:19:44 CEST
(In reply to Ingo Steuwer from comment #10)
> This is a "change the world" bug report - @phahn, can you split it in
> several bugs?
> 
> a) one for code cleanups
> 
> b) potential issues - risks, but not (yet) reproducable or seen in the wild
> 
> c) individual bugs for known issues - reproducable deadlocks or other
> problems

The security issues were already separated, see "Blocks"

The rest is just bad/fragile/over-complicated/ignorant code. Please just fix'em all with pair-programming instead of creating another management overhead for discussion/assigning/documentation/...
Comment 12 Philipp Hahn univentionstaff 2021-09-22 11:42:40 CEST
Happened again:
- last_id was -1
- <TransID> war in listener/listener.priv
- propagated as 0 to notifier/transactions

Fixed by:
- systemdctl stop slapd univention-directory-{listener,notifier}
- Moved 0-TIDs from notifier/transactions back to listener/listener
- Moved <TransID>-TIDs from listener/listener.priv back to listener/listener
- Re-numbered listener/listener
- Wrote last used TID to last_id
- systemdctl start slapd univention-directory-{listener,notifier}
Comment 13 Philipp Hahn univentionstaff 2021-12-07 08:08:54 CET
(In reply to Philipp Hahn from comment #9)
> 26. very long start-up delay if index is not up-to-date:
>   1 (gdb) where
>   2 #0  notify_transcation_get_one_dn (last_known_id=50266903) at
> notify.c:602
>   3 #1  0x0000556e8dcd965c in notifier_cache_init (max_id=50267710) at
> cache.c:84
>   4 #2  0x0000556e8dcdea70 in main (argc=7, argv=0x7ffcfbf75e18) at
> univention-directory-notifier.c:271                             
> 
> src/cache.c
> >   66 »···int count = 0;
> >   81 »···for ( i=max_id - (notifier_cache_size-1); i <= max_id; i++) {
> 
> "max_id" is the last seen transaction ID
> "notifier_cache_size" is 1000 by default (via UCR)
> 
> If less then 1000 transactions exist, "i" is negative and is passed to
> 
> >   84 »···»···buffer=notify_transcation_get_one_dn ( i );
> 
> as it does NOT find a negative transaction inside the case, the code walks
> over ALL transactions sequentially to re-build the index. If the list is
> large, this takes a very long time. It it does not find the requested
> transaction, the get1dn() returns NULL and init() aborts early with the
> in-memory cache not being initialized as max_filled=0 remains:

Bug #51911 tried to fix this via bb2dc8fa802 and ec5d2e22505 but introduced a new regression Bug #54203, where the cache is not initialized at all if less then notifier_cache_size:=1000 entries exist.

The cause for this is similar to the same issues as in "24. EOF handling is wrong." (git:6b3c2bc8df5bb5e6c19faff13cbc9a3a68a3bc56)  where "unsigned long" is compared to "signed int":

> int notifier_cache_init ( unsigned long max_id) {
> 	int i;>             for ( i=MAX(max_id - (notifier_cache_size-1), 1); i <= max_id; i++) {

In C99, integer promotion is clearly defined in following rule (6.3.1.1):
> Then, if either operand is unsigned long the other shall be converted to unsigned long.

`i: int = -552`  → `unsigned int = 18446744073709551064` <= `max_id: unsigned long = 1000` → FALSE
Comment 14 Philipp Hahn univentionstaff 2021-12-07 08:29:30 CET
Please allow logging to STDERR if "-F" is given twice:

diff --git management/univention-directory-notifier/src/univention-directory-notifier.c management/univention-directory-notifier/src/univention-directory-notifier.c
index 2457ce1d1e..b4ce8e0dd6 100644
--- management/univention-directory-notifier/src/univention-directory-notifier.c
+++ management/univention-directory-notifier/src/univention-directory-notifier.c
@@ -168,7 +168,7 @@ int main(int argc, char* argv[])
 
                switch (c) {
                        case 'F':
-                               foreground = 1;
+                               foreground++;
                                break;
                        case 'd':
                                debug = atoi(optarg);
@@ -203,7 +203,7 @@ int main(int argc, char* argv[])
                daemon(1,1);
        }
 
-       univention_debug_init("/var/log/univention/notifier.log",1,1);
+       univention_debug_init(foreground < 2 ? "/var/log/univention/notifier.log" : "stderr", 1, 1);
        univention_debug_set_level(UV_DEBUG_TRANSFILE, debug);
 
        if ( creating_pidfile("/var/run/udsnotifier.pid") != 0 )
@@ -240,5 +240,3 @@ int main(int argc, char* argv[])
 
        return 0;
 }
-
-
Comment 15 Philipp Hahn univentionstaff 2022-02-16 16:56:16 CET
27: valgrind
==11224== Syscall param socketcall.bind(my_addr.sin6_flowinfo) points to uninitialised byte(s)
==11224==    at 0x49A2497: bind (syscall-template.S:78)
==11224==    by 0x10BEDA: network_create_socket (network.c:84)
==11224==    by 0x10C44E: network_client_init (network.c:280)
==11224==    by 0x10ED83: main (univention-directory-notifier.c:254)
==11224==  Address 0x1fff000154 is on thread 1's stack
==11224==  in frame #1, created by network_create_socket (network.c:70)
==11224== 
==11224== Syscall param socketcall.bind(my_addr.sin6_scope_id) points to uninitialised byte(s)
==11224==    at 0x49A2497: bind (syscall-template.S:78)
==11224==    by 0x10BEDA: network_create_socket (network.c:84)
==11224==    by 0x10C44E: network_client_init (network.c:280)
==11224==    by 0x10ED83: main (univention-directory-notifier.c:254)
==11224==  Address 0x1fff000168 is on thread 1's stack
==11224==  in frame #1, created by network_create_socket (network.c:70)
==11224==
Comment 16 Philipp Hahn univentionstaff 2022-02-17 10:05:15 CET
(In reply to Philipp Hahn from comment #2)
> 15. Multiple memory leaks: dn_string, tmp=strdup(entry->dn)

valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./univention-directory-notifier -FF -d 3 -v 3

==11238== HEAP SUMMARY:
==11238==     in use at exit: 46,137 bytes in 377 blocks
==11238==   total heap usage: 2,612 allocs, 2,235 frees, 3,734,732 bytes allocated
==11238== 
==11238== 7 bytes in 1 blocks are still reachable in loss record 1 of 6
==11238==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==11238==    by 0x492FDB9: strdup (strdup.c:42)
==11238==    by 0x485025E: univention_debug_init (in /usr/lib/x86_64-linux-gnu/libuniventiondebug.so.1.0.0)
==11238==    by 0x10EC8D: main (univention-directory-notifier.c:231)

missing univention_debug_exit()

==11238== 48 bytes in 1 blocks are still reachable in loss record 2 of 6
==11238==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==11238==    by 0x10C032: network_client_add (network.c:115)
==11238==    by 0x10C46D: network_client_init (network.c:281)
==11238==    by 0x10ED83: main (univention-directory-notifier.c:254)

missing network_client_del(server_socketfd_listener) + close(server_socketfd_listener)

==11238== 48 bytes in 1 blocks are still reachable in loss record 3 of 6
==11238==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==11238==    by 0x10C09D: network_client_add (network.c:130)
==11238==    by 0x10C432: new_connection (network.c:272)
==11238==    by 0x10C7BA: network_client_main_loop (network.c:346)
==11238==    by 0x10EDC9: main (univention-directory-notifier.c:262)

missing network_client_del(…)

==11238== 67 bytes in 3 blocks are definitely lost in loss record 4 of 6
==11238==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==11238==    by 0x10B1E4: data_on_connection (callback.c:128)
==11238==    by 0x10C7BA: network_client_main_loop (network.c:346)
==11238==    by 0x10EDC9: main (univention-directory-notifier.c:262)

REMOTE EXPLOITABLE MEMORY LEAK; missing free(network_packet) on all exit paths from data_on_connection()

==11238== 21,967 bytes in 370 blocks are still reachable in loss record 5 of 6
==11238==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==11238==    by 0x10AA84: notifier_cache_init (cache.c:81)
==11238==    by 0x10ED5E: main (univention-directory-notifier.c:251)

missing free(cache[…].dn)

==11238== 24,000 bytes in 1 blocks are still reachable in loss record 6 of 6
==11238==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==11238==    by 0x10A8B0: notifier_cache_init (cache.c:57)
==11238==    by 0x10ED5E: main (univention-directory-notifier.c:251)

missing free(cache);

==11238== LEAK SUMMARY:
==11238==    definitely lost: 67 bytes in 3 blocks
==11238==    indirectly lost: 0 bytes in 0 blocks
==11238==      possibly lost: 0 bytes in 0 blocks
==11238==    still reachable: 46,070 bytes in 374 blocks
==11238==         suppressed: 0 bytes in 0 blocks
Comment 17 Philipp Hahn univentionstaff 2022-09-21 15:48:21 CEST
(In reply to Philipp Hahn from comment #12)
> Happened again:

4.4-8

> - last_id was -1
> - <TransID> war in listener/listener.priv
> - propagated as 0 to notifier/transactions
> 
> Fixed by:
> - systemdctl stop slapd univention-directory-{listener,notifier}
> - Moved 0-TIDs from notifier/transactions back to listener/listener

cd /var/lib/univention-ldap
sed -e '/^0 /{p;d}' -i notify/transactions > listener/tmp

> - Moved <TransID>-TIDs from listener/listener.priv back to listener/listener

cat listener/listener.priv listener/listener >>listener/tmp
truncate -s 0 listener/listener.priv listener/listener

> - Re-numbered listener/listener

off="$(awk 'END{print $1}' notify/transactions)"
awk -v off="$off" '{print NR+off,$2,$3}' <listener/tmp >listener/listener.priv
rm -f listener/tmp

> - Wrote last used TID to last_id

awk 'END{print $1}' listener/listener.priv >./last_id

> - systemdctl start slapd univention-directory-{listener,notifier}
Comment 18 Philipp Hahn univentionstaff 2023-03-03 16:57:46 CET
Again 2023030221000355