Univention Bugzilla – Bug 47313
Restart NFS daemon takes 20 to 30 minutes during upgrade 4.2.x -> 4.3.0
Last modified: 2020-05-12 19:03:58 CEST
During an upgrade from v4.2.x to 4.3.0 systemd restarts the nfs server. But somehow it rans into a timeout which takes around 20 to 30minutes to proceed. This is more than annoying when upgrading multiple servers to 4.3.0. ======================================= systemctl reports in updater.log: Job for nfs-server.service failed because the control process exited with error code. See "systemctl status nfs-server.service" and "journalctl -xe" for details. invoke-rc.d: initscript nfs-kernel-server, action "restart" failed. ● nfs-server.service - NFS server and services Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2018-07-05 13:35:49 CEST; 34ms ago Process: 5388 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE) Process: 5386 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Main PID: 5388 (code=exited, status=1/FAILURE) CPU: 6ms ======================================= journalctl -xe afterwards reports: Jul 05 13:27:03 master systemd[1]: Starting NFS server and services... Jul 05 13:35:49 master rpc.nfsd[5388]: rpc.nfsd: writing fd to kernel faile…out) Jul 05 13:35:49 master rpc.nfsd[5388]: rpc.nfsd: unable to set any sockets …nfsd Jul 05 13:35:49 master systemd[1]: nfs-server.service: Main process exited,…LURE Jul 05 13:35:49 master systemd[1]: Failed to start NFS server and services. Jul 05 13:35:49 master systemd[1]: nfs-server.service: Unit entered failed …ate. Jul 05 13:35:49 master systemd[1]: nfs-server.service: Failed with result '…de'. Warning: nfs-server.service changed on disk. Run 'systemctl daemon-reload' to reload units. Hint: Some lines were ellipsized, use -l to show in full. Failed to restart nfs-kernel-server, ignoring. ======================================= Might be related to https://bugzilla.redhat.com/show_bug.cgi?id=1424940 ======================================= parts of pstree during the 20 minutes of "sleep": --force-confold --force-overwrite --force-overwrite-dir --status-fd 60 --configure ... | `-nfs-kernel-serv /var/lib/dpkg/info/nfs-kernel-server.postinst configure ... | `-invoke-rc.d /usr/sbin/invoke-rc.d nfs-kernel-server restart | `-systemctl restart nfs-kernel-server.service | `-systemd-tty-ask-password-agent --watch
Customer here... I'd say it's more around 5-10 minutes. But nevertheless it's kind of nerve-wrecking to sit in front of a terminal (updater.log), that doesn't respond for that long time.
*** Bug 47278 has been marked as a duplicate of this bug. ***
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774153> was fixed with systemd_215-17+deb8u6, but UCS-4.2-4 is already at deb8u7: <http://xen1.knut.univention.de:8000/packages/source/systemd/> According to <https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1456789> the bug is triggered by upgrading systemd and calling "systemctl" while the old daemon is still running; calling "systemctl daemon-reexec" fixed it for most users. Quoting <https://github.com/NixOS/nixpkgs/issues/2584>: > The presence of systemd-tty-ask-password-agent means nothing - systemctl always starts it to handle any password queries that might come up. There probably is some systemd service that hangs or is waiting for some device dependency to appear. To see which, run systemctl list-jobs.
(In reply to Philipp Hahn from comment #3) > According to > <https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1456789> the bug is > triggered by upgrading systemd and calling "systemctl" while the old daemon > is still running; calling "systemctl daemon-reexec" fixed it for most users. Sounds good. Shouldn't that be part of the postinst of the systemd package? If it isn't, can we patch it to do that?
I think this is collateral damage of Bug #46875, when disabling rpcbind during the update better disable nfs-server too Modified preup/postup on apt.knut.univention.de --- postup.sh.orig 2018-10-17 10:28:29.722580108 +0200 +++ postup.sh 2018-10-17 16:42:07.197705993 +0200 @@ -204,15 +204,19 @@ /usr/share/univention-directory-manager-tools/univention-migrate-users-to-ucs4.3 >>"$UPDATER_LOG" 2>&1 # Bug #46875: unmask the rpcbind service again after update +# Bug# 47313 - Restart NFS daemon takes 20 to 30 minutes during upgrade 4.2.x -> 4.3.0 if grep -q '^rpcbind/autostart: no$' /etc/univention/base-forced.conf; then - ucr unset --force rpcbind/autostart - systemctl daemon-reload - if [ -x /usr/sbin/rpcinfo ]; then - systemctl restart rpcbind - systemctl restart nfs-kernel-server - systemctl restart nfs-server - systemctl restart quotarpc - fi + ucr unset --force rpcbind/autostart >>"$UPDATER_LOG" 2>&1 + systemctl daemon-reload >>"$UPDATER_LOG" 2>&1 +fi +if [ -x /usr/sbin/rpcinfo ]; then + systemctl unmask nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1 + systemctl enable nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1 + systemctl daemon-reload >>"$UPDATER_LOG" 2>&1 + systemctl restart rpcbind >>"$UPDATER_LOG" 2>&1 + systemctl restart nfs-kernel-server >>"$UPDATER_LOG" 2>&1 + systemctl restart nfs-server >>"$UPDATER_LOG" 2>&1 + systemctl restart quotarpc >>"$UPDATER_LOG" 2>&1 fi # Bug #47828: remove possible deactivated updater warniung fbotner@omar:/var/univention/buildsystem2/test_mirror/ftp/4.3/maintained/4.3-0/all-> diff -Nur preup.sh.orig preup.sh --- preup.sh.orig 2018-10-17 10:28:21.610526550 +0200 +++ preup.sh 2018-10-17 15:58:18.097589407 +0200 @@ -685,11 +685,15 @@ esac # Bug #46875: mask the rpcbind service during update +# Bug #47313 - Restart NFS daemon takes 20 to 30 minutes during upgrade 4.2.x -> 4.3.0 if [ -x /sbin/rpcbind ]; then is_ucr_true rpcbind/autostart if [ $? -ne 1 ]; then - ucr set --force rpcbind/autostart=no + ucr set --force rpcbind/autostart=no >>"$UPDATER_LOG" 2>&1 fi + systemctl stop nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1 + systemctl disable nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1 + systemctl mask nfs-kernel-server.service >>"$UPDATER_LOG" 2>&1 fi echo "" nfs-kernel-server/autostart=no did not work for me QA: please re-open for the next steps (copy to testing.univention.de and updates.software-univention.de)
Ok, looks better, the error message is gone.
Updated postup/preup on updates.software-univention.de and testing ... and tested update.
Ok, looks good.
Has been released → closed.