Univention Bugzilla – Full Text Bug Listing |
Summary: | Within one day the virtual-machine-manager-daemon.log increases to 400Gb | ||
---|---|---|---|
Product: | UCS | Reporter: | Christina Scheinig <scheinig> |
Component: | UMC - Virtual machines (UVMM) | Assignee: | Florian Best <best> |
Status: | CLOSED FIXED | QA Contact: | Philipp Hahn <hahn> |
Severity: | normal | ||
Priority: | P5 | CC: | best, damrose, steuwer |
Version: | UCS 4.4 | ||
Target Milestone: | UCS 4.4-1-errata | ||
Hardware: | Other | ||
OS: | Linux | ||
What kind of report is it?: | Bug Report | What type of bug is this?: | 7: Crash: Bug causes crash or data loss |
Who will be affected by this bug?: | 1: Will affect a very few installed domains | How will those affected feel about the bug?: | 3: A User would likely not purchase the product |
User Pain: | 0.120 | Enterprise Customer affected?: | Yes |
School Customer affected?: | ISV affected?: | ||
Waiting Support: | Flags outvoted (downgraded) after PO Review: | ||
Ticket number: | 2019062521000734 | Bug group (optional): | |
Max CVSS v3 score: |
Description
Christina Scheinig
2019-06-27 15:27:10 CEST
(In reply to Christina Scheinig from comment #0) > -rw-r----- 1 root adm 415G Jun 25 12:16 > /var/lib/libvirt/images2/virtual-machine-manager-daemon.log Where does this file come from? From which dates does that logfile contain logentries? UVMMd uses the hardcoded logfile /var/log/univention/virtual-machine-manager-daemon.log, and that file is rotated by logrotate. Which type of error in the logfiles caused the massive increase of filesize? In comment0 i see errors that a hypervisor is not reachable, but this has a backoff logic and should only be logged every 5 minutes after the connection breaks of. Another error seems to be with virConnectListAllDomains... (In reply to Erik Damrose from comment #1) > (In reply to Christina Scheinig from comment #0) > > -rw-r----- 1 root adm 415G Jun 25 12:16 > > /var/lib/libvirt/images2/virtual-machine-manager-daemon.log > > Where does this file come from? From which dates does that logfile contain > logentries? I think the customer moved the logfile to get free space again. The logfile starts on 2019-06-24 06:25:04,255 - uvmmd - INFO - Logfile reopened and ends 2019-06-25 12:16:03,408 - uvmmd.node - INFO - Connected to 'qemu://.... > UVMMd uses the hardcoded logfile > /var/log/univention/virtual-machine-manager-daemon.log, and that file is > rotated by logrotate. > > Which type of error in the logfiles caused the massive increase of filesize? > This message is repeated all the time: 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - qemu://master.schein.ig/system: Exception in domainEventDeregisterAny Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in _unregister self.conn.domainEventDeregisterAny(self.domainCB.pop()) > In comment0 i see errors that a hypervisor is not reachable, but this has a > backoff logic and should only be logged every 5 minutes after the connection > breaks of. > > Another error seems to be with virConnectListAllDomains... (In reply to Christina Scheinig from comment #2) > > Which type of error in the logfiles caused the massive increase of filesize? > > > This message is repeated all the time: > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - > qemu://master.schein.ig/system: Exception in domainEventDeregisterAny > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in > _unregister > self.conn.domainEventDeregisterAny(self.domainCB.pop()) If this message is the reason for the log file growth: do we know the root cause? (In reply to Ingo Steuwer from comment #3) > (In reply to Christina Scheinig from comment #2) > > > Which type of error in the logfiles caused the massive increase of filesize? > > > > > This message is repeated all the time: > > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - > > qemu://master.schein.ig/system: Exception in domainEventDeregisterAny > > Traceback (most recent call last): > > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in > > _unregister > > self.conn.domainEventDeregisterAny(self.domainCB.pop()) > > If this message is the reason for the log file growth: do we know the root > cause? Unfortunatly, we don't know for sure. According to the logfile and the statement of the customer the connection to the master was broken caused by a failed vpn connection. After that you can see two messages of input/output error and then the spamming starts. Node._unregister() / Node.update_autoreconnect() / Node.run() is called concurrently by multiple threads. I think the Node.close_event() callback destroys the connection which causes the exceptions in Node._unregister(). Maybe some thread locking has to be done? So far from the Ticket information I understand the setup in the following way: * There is 1 DC Master and 3 DC Backups * The UVMM is running on one of the DC Backups * The DC Master also have a libvirt instance which is used from the DC Backup * The general network connection to the DC Master doesn't work * On the DC Backup slapd is not running(?) (In reply to Christina Scheinig from comment #0) > 2019-06-24 10:46:14,928 - uvmmd.node - WARNING - > 'qemu://master.schein.ig/system' broken? next check in 0:05:00.000. unable > to connect to server at 'master.schein.ig > :16514': Connection timed out > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 834, in > update_autoreconnect > self.conn = libvirt.open(self.pd.uri) > File "/usr/lib/python2.7/dist-packages/libvirt.py", line 255, in open > if ret is None:raise libvirtError('virConnectOpen() failed') > libvirtError: unable to connect to server at 'master.schein.ig:16514': > Connection timed out Tracebacks in update_autoreconnect() are only logged once now. Otherwise only a warning is logged. > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 842, in > update_autoreconnect > self.update() > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1123, in > update > for dom in self.conn.listAllDomains(): > File "/usr/lib/python2.7/dist-packages/libvirt.py", line 5011, in > listAllDomains > raise libvirtError("virConnectListAllDomains() failed", conn=self) > libvirtError: Cannot recv data: Input/output error Same for this traceback. > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - > qemu://master.schein.ig/system: Exception in domainEventDeregisterAny > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in > _unregister > self.conn.domainEventDeregisterAny(self.domainCB.pop()) > File "/usr/lib/python2.7/dist-packages/libvirt.py", line 4820, in > domainEventDeregisterAny > if ret == -1: raise libvirtError ('virConnectDomainEventDeregisterAny() > failed', conn=self) > libvirtError: internal error: client socket is closed Exceptions in _unregister() are now logged as warning without the traceback. > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - An error occurred, but the > cause is unknown > 2019-06-24 12:34:25,041 - uvmmd.unix - WARNING - [58861] Error doing command > "DOMAIN_STATE": Error managing domain > "7f75c911-cea5-49de-8ffb-ab6090733bef": An error occurred, but the cause is > unknown > 2019-06-24 12:34:25,041 - uvmmd.node - ERROR - > qemu://master.schein.ig/system: Exception in domainEventDeregisterAny > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/univention/uvmm/node.py", line 1097, in > _unregister > self.conn.domainEventDeregisterAny(self.domainCB.pop()) > AttributeError: 'NoneType' object has no attribute 'domainEventDeregisterAny' domainEventDeregisterAny' This actually was a race condition between the update_autoreconnect() thread and the main thread calling the event close_event(). Storing self.conn in a local variable ensures the instance is accessible and the callbacks can be deregistered. univention-virtual-machine-manager-daemon (8.0.1-8) 6bf9160197c8 | Bug #49737 UVMM: Improve Exception handling on connection close univention-virtual-machine-manager-daemon.yaml 43c6b1b8791d | YAML Bug #28771, Bug #49695, Bug #49737 OK: Code review OK: tail -f /var/log/univention/virtual-machine-manager-daemon.log /etc/init.d/libvirtd stop OK: errata-announce -V --only univention-virtual-machine-manager-schema.yaml OK: YAML |