Bug 47313 - Restart NFS daemon takes 20 to 30 minutes during upgrade 4.2.x -> 4.3.0
Restart NFS daemon takes 20 to 30 minutes during upgrade 4.2.x -> 4.3.0
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: NFS
UCS 4.2
Other Linux
: P5 normal (vote)
: ---
Assigned To: Felix Botner
Arvid Requate
:
: 47278 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-07-05 14:40 CEST by Christian Völker
Modified: 2020-05-12 19:03 CEST (History)
11 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 4: Will affect most installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.229
Enterprise Customer affected?: Yes
School Customer affected?:
ISV affected?:
Waiting Support: Yes
Flags outvoted (downgraded) after PO Review:
Ticket number: 2018070421000194, 2018062521000736, 2018071021000753
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Völker univentionstaff 2018-07-05 14:40:53 CEST
During an upgrade from v4.2.x to 4.3.0 systemd restarts the nfs server. But somehow it rans into a timeout which takes around 20 to 30minutes to proceed.

This is more than annoying when upgrading multiple servers to 4.3.0.

=======================================
systemctl reports in updater.log:

Job for nfs-server.service failed because the control process exited with error code.
See "systemctl status nfs-server.service" and "journalctl -xe" for details.
invoke-rc.d: initscript nfs-kernel-server, action "restart" failed.
● nfs-server.service - NFS server and services
   Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2018-07-05 13:35:49 CEST; 34ms ago
  Process: 5388 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 5386 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 5388 (code=exited, status=1/FAILURE)
      CPU: 6ms

=======================================
journalctl -xe afterwards reports:

Jul 05 13:27:03 master systemd[1]: Starting NFS server and services...
Jul 05 13:35:49 master rpc.nfsd[5388]: rpc.nfsd: writing fd to kernel faile…out)
Jul 05 13:35:49 master rpc.nfsd[5388]: rpc.nfsd: unable to set any sockets …nfsd
Jul 05 13:35:49 master systemd[1]: nfs-server.service: Main process exited,…LURE
Jul 05 13:35:49 master systemd[1]: Failed to start NFS server and services.
Jul 05 13:35:49 master systemd[1]: nfs-server.service: Unit entered failed …ate.
Jul 05 13:35:49 master systemd[1]: nfs-server.service: Failed with result '…de'.
Warning: nfs-server.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Hint: Some lines were ellipsized, use -l to show in full.
Failed to restart nfs-kernel-server, ignoring.

=======================================
Might be related to 
https://bugzilla.redhat.com/show_bug.cgi?id=1424940


=======================================
parts of pstree during the 20 minutes of "sleep":

 --force-confold --force-overwrite --force-overwrite-dir --status-fd 60 --configure ...
  |                           `-nfs-kernel-serv /var/lib/dpkg/info/nfs-kernel-server.postinst configure ...
  |                               `-invoke-rc.d /usr/sbin/invoke-rc.d nfs-kernel-server restart
  |                                   `-systemctl restart nfs-kernel-server.service
  |                                       `-systemd-tty-ask-password-agent --watch
Comment 1 Benjamin Kix 2018-07-06 09:33:13 CEST
Customer here...

I'd say it's more around 5-10 minutes. But nevertheless it's kind of nerve-wrecking to sit in front of a terminal (updater.log), that doesn't respond for that long time.
Comment 2 Arvid Requate univentionstaff 2018-07-10 16:14:32 CEST
*** Bug 47278 has been marked as a duplicate of this bug. ***
Comment 3 Philipp Hahn univentionstaff 2018-10-15 14:12:14 CEST
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774153> was fixed with systemd_215-17+deb8u6, but UCS-4.2-4 is already at deb8u7: <http://xen1.knut.univention.de:8000/packages/source/systemd/>

According to <https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1456789> the bug is triggered by upgrading systemd and calling "systemctl" while the old daemon is still running; calling "systemctl daemon-reexec" fixed it for most users.

Quoting <https://github.com/NixOS/nixpkgs/issues/2584>:
> The presence of systemd-tty-ask-password-agent means nothing - systemctl always starts it to handle any password queries that might come up. There probably is some systemd service that hangs or is waiting for some device dependency to appear. To see which, run systemctl list-jobs.
Comment 4 Daniel Tröder univentionstaff 2018-10-15 15:28:15 CEST
(In reply to Philipp Hahn from comment #3)
> According to
> <https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1456789> the bug is
> triggered by upgrading systemd and calling "systemctl" while the old daemon
> is still running; calling "systemctl daemon-reexec" fixed it for most users.
Sounds good. Shouldn't that be part of the postinst of the systemd package? If it isn't, can we patch it to do that?
Comment 5 Felix Botner univentionstaff 2018-10-17 16:51:49 CEST
I think this is collateral damage of Bug #46875, when disabling rpcbind during the update better disable nfs-server too

Modified preup/postup on apt.knut.univention.de

--- postup.sh.orig	2018-10-17 10:28:29.722580108 +0200
+++ postup.sh	2018-10-17 16:42:07.197705993 +0200
@@ -204,15 +204,19 @@
 /usr/share/univention-directory-manager-tools/univention-migrate-users-to-ucs4.3 >>"$UPDATER_LOG" 2>&1
 
 # Bug #46875: unmask the rpcbind service again after update
+# Bug# 47313 - Restart NFS daemon takes 20 to 30 minutes during upgrade 4.2.x -> 4.3.0 
 if grep -q '^rpcbind/autostart: no$' /etc/univention/base-forced.conf; then
-	ucr unset --force rpcbind/autostart
-	systemctl daemon-reload
-	if [ -x /usr/sbin/rpcinfo ]; then
-		systemctl restart rpcbind
-		systemctl restart nfs-kernel-server
-		systemctl restart nfs-server
-		systemctl restart quotarpc
-	fi
+	ucr unset --force rpcbind/autostart >>"$UPDATER_LOG" 2>&1
+	systemctl daemon-reload >>"$UPDATER_LOG" 2>&1
+fi
+if [ -x /usr/sbin/rpcinfo ]; then
+	systemctl unmask nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1
+	systemctl enable nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1
+	systemctl daemon-reload >>"$UPDATER_LOG" 2>&1
+	systemctl restart rpcbind >>"$UPDATER_LOG" 2>&1
+	systemctl restart nfs-kernel-server >>"$UPDATER_LOG" 2>&1
+	systemctl restart nfs-server >>"$UPDATER_LOG" 2>&1
+	systemctl restart quotarpc >>"$UPDATER_LOG" 2>&1
 fi
 
 # Bug #47828: remove possible deactivated updater warniung
fbotner@omar:/var/univention/buildsystem2/test_mirror/ftp/4.3/maintained/4.3-0/all-> diff -Nur preup.sh.orig preup.sh
--- preup.sh.orig	2018-10-17 10:28:21.610526550 +0200
+++ preup.sh	2018-10-17 15:58:18.097589407 +0200
@@ -685,11 +685,15 @@
 esac
 
 # Bug #46875: mask the rpcbind service during update
+# Bug #47313 - Restart NFS daemon takes 20 to 30 minutes during upgrade 4.2.x -> 4.3.0
 if [ -x /sbin/rpcbind ]; then
 	is_ucr_true rpcbind/autostart
 	if [ $? -ne 1 ]; then
-		ucr set --force rpcbind/autostart=no
+		ucr set --force rpcbind/autostart=no >>"$UPDATER_LOG" 2>&1
 	fi
+	systemctl stop nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1
+	systemctl disable nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1
+	systemctl mask nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1
 fi
 
 echo ""

nfs-kernel-server/autostart=no did not work for me

QA: please re-open for the next steps (copy to testing.univention.de and updates.software-univention.de)
Comment 6 Arvid Requate univentionstaff 2018-10-29 20:14:59 CET
Ok, looks better, the error message is gone.
Comment 7 Felix Botner univentionstaff 2018-11-01 12:20:00 CET
Updated postup/preup on updates.software-univention.de and testing ... and tested update.
Comment 8 Arvid Requate univentionstaff 2018-11-01 19:42:17 CET
Ok, looks good.
Comment 9 Stefan Gohmann univentionstaff 2018-11-02 06:29:48 CET
Has been released → closed.