Univention Bugzilla – Bug 49737
Within one day the virtual-machine-manager-daemon.log increases to 400Gb
Last modified: 2019-08-07 15:44:32 CEST
In a customer environment the logfile of two backupservers increases to 400Gb within one day. This causes the hard disk to ran out of space. The logfile is filled with tracebacks as follows. I have renamed the doamin name. -rw-r----- 1 root adm 415G Jun 25 12:16 /var/lib/libvirt/images2/virtual-machine-manager-daemon.log 2019-06-24 06:25:04,255 - uvmmd - INFO - Logfile reopened. 2019-06-24 06:28:12,048 - uvmmd.node - WARNING - 'qemu://master.schein.ig/system' broken? next check in 0:05:00.000. unable to connect to server at 'master.schein.ig :16514': Connection timed out Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 834, in update_autoreconnect self.conn = libvirt.open(self.pd.uri) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 255, in open if ret is None:raise libvirtError('virConnectOpen() failed') libvirtError: unable to connect to server at 'master.schein.ig:16514': Connection timed out 2019-06-24 06:35:22,128 - uvmmd.node - WARNING - 'qemu://master.schein.ig/system' broken? next check in 0:05:00.000. unable to connect to server at 'master.schein.ig :16514': Connection timed out Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 834, in update_autoreconnect self.conn = libvirt.open(self.pd.uri) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 255, in open if ret is None:raise libvirtError('virConnectOpen() failed') libvirtError: unable to connect to server at 'master.schein.ig:16514': Connection timed out 2019-06-24 06:42:32,208 - uvmmd.node - WARNING - 'qemu://master.schein.ig/system' broken? next check in 0:05:00.000. unable to connect to server at 'master.schein.ig :16514': Connection timed out Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 834, in update_autoreconnect self.conn = libvirt.open(self.pd.uri) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 255, in open if ret is None:raise libvirtError('virConnectOpen() failed') libvirtError: unable to connect to server at 'master.schein.ig:16514': Connection timed out [...] libvirtError: unable to connect to server at 'master.schein.ig:16514': Connection timed out 2019-06-24 10:31:54,768 - uvmmd.node - WARNING - 'qemu://master.schein.ig/system' broken? next check in 0:05:00.000. unable to connect to server at 'master.schein.ig :16514': Connection timed out Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 834, in update_autoreconnect self.conn = libvirt.open(self.pd.uri) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 255, in open if ret is None:raise libvirtError('virConnectOpen() failed') libvirtError: unable to connect to server at 'master.schein.ig:16514': Connection timed out 2019-06-24 10:39:04,848 - uvmmd.node - WARNING - 'qemu://master.schein.ig/system' broken? next check in 0:05:00.000. unable to connect to server at 'master.schein.ig :16514': Connection timed out Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 834, in update_autoreconnect self.conn = libvirt.open(self.pd.uri) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 255, in open if ret is None:raise libvirtError('virConnectOpen() failed') libvirtError: unable to connect to server at 'master.schein.ig:16514': Connection timed out 2019-06-24 10:46:14,928 - uvmmd.node - WARNING - 'qemu://master.schein.ig/system' broken? next check in 0:05:00.000. unable to connect to server at 'master.schein.ig :16514': Connection timed out Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 834, in update_autoreconnect self.conn = libvirt.open(self.pd.uri) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 255, in open if ret is None:raise libvirtError('virConnectOpen() failed') libvirtError: unable to connect to server at 'master.schein.ig:16514': Connection timed out 2019-06-24 10:51:15,349 - uvmmd.node - INFO - Connected to 'qemu://master.schein.ig/system' [..] 2019-06-24 11:19:19,145 - uvmmd.node - INFO - Connected to 'qemu://backup5-10.schein.ig/system' 2019-06-24 12:34:25,040 - uvmmd.node - WARNING - 'qemu://master.schein.ig/system' broken? next check in 0:00:30.000. Cannot recv data: Input/output error Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 842, in update_autoreconnect self.update() File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1123, in update for dom in self.conn.listAllDomains(): File "/usr/lib/python2.7/dist-packages/libvirt.py", line 5011, in listAllDomains raise libvirtError("virConnectListAllDomains() failed", conn=self) libvirtError: Cannot recv data: Input/output error 2019-06-24 12:34:25,041 - uvmmd.node.connection - INFO - Connection qemu://master.schein.ig/system closed: Error 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - Cannot recv data: Input/output error 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) File "/usr/lib/python2.7/dist-packages/libvirt.py", line 4820, in domainEventDeregisterAny if ret == -1: raise libvirtError ('virConnectDomainEventDeregisterAny() failed', conn=self) libvirtError: internal error: client socket is closed 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - An error occurred, but the cause is unknown 2019-06-24 12:34:25,041 - uvmmd.unix - WARNING - [58861] Error doing command "DOMAIN_STATE": Error managing domain "7f75c911-cea5-49de-8ffb-ab6090733bef": An error occurred, but the cause is unknown 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) AttributeError: 'NoneType' object has no attribute 'domainEventDeregisterAny' 2019-06-24 12:34:25,041 - uvmmd.unix - WARNING - [58898] Error doing command "DOMAIN_STATE": Error managing domain "7f75c911-cea5-49de-8ffb-ab6090733bef": An error occurred, but the cause is unknown 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) [...] 2019-06-25 04:06:32,064 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) AttributeError: 'NoneType' object has no attribute 'domainEventDeregisterAny' 2019-06-25 04:06:32,064 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) AttributeError: 'NoneType' object has no attribute 'domainEventDeregisterAny' 2019-06-25 04:06:32,064 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) AttributeError: 'NoneType' object has no attribute 'domainEventDeregisterAny' 2019-06-25 04:06:32,064 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) AttributeError: 'NoneType' object has no attribute 'domainEventDeregisterAny' 2019-06-25 04:06:32,064 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) AttributeError: 'NoneType' object has no attribute 'domainEventDeregisterAny' 2019-06-25 04:06:32,064 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.do2019-06-25 12:16:02,649 - uvmmd.node - INFO - timer_callback(qemu://backup0.schein.ig/system) start 2019-06-25 12:16:02,741 - uvmmd.node - INFO - timer_callback(qemu://backup2.schein.ig/system) start 2019-06-25 12:16:02,750 - uvmmd.node - INFO - Connected to 'qemu://backup2.schein.ig/system' 2019-06-25 12:16:02,774 - uvmmd.node - INFO - timer_callback(qemu://master.schein.ig/system) start 2019-06-25 12:16:02,799 - uvmmd.node - INFO - timer_callback(qemu://backup5-10.schein.ig/system) start 2019-06-25 12:16:03,053 - uvmmd.node - INFO - Connected to 'qemu://backup0.schein.ig/system' 2019-06-25 12:16:03,395 - uvmmd.node - INFO - Connected to 'qemu://master.schein.ig/system' 2019-06-25 12:16:03,408 - uvmmd.node - INFO - Connected to 'qemu://backup5-10.schein.ig/system'
(In reply to Christina Scheinig from comment #0) > -rw-r----- 1 root adm 415G Jun 25 12:16 > /var/lib/libvirt/images2/virtual-machine-manager-daemon.log Where does this file come from? From which dates does that logfile contain logentries? UVMMd uses the hardcoded logfile /var/log/univention/virtual-machine-manager-daemon.log, and that file is rotated by logrotate. Which type of error in the logfiles caused the massive increase of filesize? In comment0 i see errors that a hypervisor is not reachable, but this has a backoff logic and should only be logged every 5 minutes after the connection breaks of. Another error seems to be with virConnectListAllDomains...
(In reply to Erik Damrose from comment #1) > (In reply to Christina Scheinig from comment #0) > > -rw-r----- 1 root adm 415G Jun 25 12:16 > > /var/lib/libvirt/images2/virtual-machine-manager-daemon.log > > Where does this file come from? From which dates does that logfile contain > logentries? I think the customer moved the logfile to get free space again. The logfile starts on 2019-06-24 06:25:04,255 - uvmmd - INFO - Logfile reopened and ends 2019-06-25 12:16:03,408 - uvmmd.node - INFO - Connected to 'qemu://.... > UVMMd uses the hardcoded logfile > /var/log/univention/virtual-machine-manager-daemon.log, and that file is > rotated by logrotate. > > Which type of error in the logfiles caused the massive increase of filesize? > This message is repeated all the time: 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) > In comment0 i see errors that a hypervisor is not reachable, but this has a > backoff logic and should only be logged every 5 minutes after the connection > breaks of. > > Another error seems to be with virConnectListAllDomains...
(In reply to Christina Scheinig from comment #2) > > Which type of error in the logfiles caused the massive increase of filesize? > > > This message is repeated all the time: > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - > qemu://master.schein.ig/system: Exception in domainEventDeregisterAny > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in > _unregister > self.conn.domainEventDeregisterAny(self.domainCB.pop()) If this message is the reason for the log file growth: do we know the root cause?
(In reply to Ingo Steuwer from comment #3) > (In reply to Christina Scheinig from comment #2) > > > Which type of error in the logfiles caused the massive increase of filesize? > > > > > This message is repeated all the time: > > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - > > qemu://master.schein.ig/system: Exception in domainEventDeregisterAny > > Traceback (most recent call last): > > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in > > _unregister > > self.conn.domainEventDeregisterAny(self.domainCB.pop()) > > If this message is the reason for the log file growth: do we know the root > cause? Unfortunatly, we don't know for sure. According to the logfile and the statement of the customer the connection to the master was broken caused by a failed vpn connection. After that you can see two messages of input/output error and then the spamming starts.
Node._unregister() / Node.update_autoreconnect() / Node.run() is called concurrently by multiple threads. I think the Node.close_event() callback destroys the connection which causes the exceptions in Node._unregister(). Maybe some thread locking has to be done?
So far from the Ticket information I understand the setup in the following way: * There is 1 DC Master and 3 DC Backups * The UVMM is running on one of the DC Backups * The DC Master also have a libvirt instance which is used from the DC Backup * The general network connection to the DC Master doesn't work * On the DC Backup slapd is not running(?)
(In reply to Christina Scheinig from comment #0) > 2019-06-24 10:46:14,928 - uvmmd.node - WARNING - > 'qemu://master.schein.ig/system' broken? next check in 0:05:00.000. unable > to connect to server at 'master.schein.ig > :16514': Connection timed out > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 834, in > update_autoreconnect > self.conn = libvirt.open(self.pd.uri) > File "/usr/lib/python2.7/dist-packages/libvirt.py", line 255, in open > if ret is None:raise libvirtError('virConnectOpen() failed') > libvirtError: unable to connect to server at 'master.schein.ig:16514': > Connection timed out Tracebacks in update_autoreconnect() are only logged once now. Otherwise only a warning is logged. > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 842, in > update_autoreconnect > self.update() > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1123, in > update > for dom in self.conn.listAllDomains(): > File "/usr/lib/python2.7/dist-packages/libvirt.py", line 5011, in > listAllDomains > raise libvirtError("virConnectListAllDomains() failed", conn=self) > libvirtError: Cannot recv data: Input/output error Same for this traceback. > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - > qemu://master.schein.ig/system: Exception in domainEventDeregisterAny > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in > _unregister > self.conn.domainEventDeregisterAny(self.domainCB.pop()) > File "/usr/lib/python2.7/dist-packages/libvirt.py", line 4820, in > domainEventDeregisterAny > if ret == -1: raise libvirtError ('virConnectDomainEventDeregisterAny() > failed', conn=self) > libvirtError: internal error: client socket is closed Exceptions in _unregister() are now logged as warning without the traceback. > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - An error occurred, but the > cause is unknown > 2019-06-24 12:34:25,041 - uvmmd.unix - WARNING - [58861] Error doing command > "DOMAIN_STATE": Error managing domain > "7f75c911-cea5-49de-8ffb-ab6090733bef": An error occurred, but the cause is > unknown > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - > qemu://master.schein.ig/system: Exception in domainEventDeregisterAny > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in > _unregister > self.conn.domainEventDeregisterAny(self.domainCB.pop()) > AttributeError: 'NoneType' object has no attribute 'domainEventDeregisterAny' domainEventDeregisterAny' This actually was a race condition between the update_autoreconnect() thread and the main thread calling the event close_event(). Storing self.conn in a local variable ensures the instance is accessible and the callbacks can be deregistered. univention-virtual-machine-manager-daemon (8.0.1-8) 6bf9160197c8 | Bug #49737 UVMM: Improve Exception handling on connection close univention-virtual-machine-manager-daemon.yaml 43c6b1b8791d | YAML Bug #28771, Bug #49695, Bug #49737
OK: Code review OK: tail -f /var/log/univention/virtual-machine-manager-daemon.log /etc/init.d/libvirtd stop OK: errata-announce -V --only univention-virtual-machine-manager-schema.yaml OK: YAML
<http://errata.software-univention.de/ucs/4.4/218.html>