Bug 25993 - Unerwarteter Reboot des Xen-Hostes beim Starten zu vieler Instanzen
Unerwarteter Reboot des Xen-Hostes beim Starten zu vieler Instanzen
Status: CLOSED WONTFIX
Product: UCS
Classification: Unclassified
Component: Virtualization - Xen
UCS 2.4
Other Linux
: P5 enhancement (vote)
: ---
Assigned To: UCS maintainers
:
Depends on:
Blocks: 34118
  Show dependency treegraph
 
Reported: 2012-01-31 17:32 CET by Felix Herrmann
Modified: 2023-06-28 10:45 CEST (History)
3 users (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Herrmann univentionstaff 2012-01-31 17:32:42 CET
Problem: Beim Starten zu vieler VMs auf einem Xen-Host werden alle VMs beendet (destroy) und der Host rebootet.

Ursache: Startet man unter Xen soviel VMs, dass mehr Speicher angefordert wird als
physisch vorhanden ist, versucht die Dom0 den Anforderungen so lange nachzukommen, bis sie selbst keinen Speicher mehr zur Verfügung hat. Dann werden alle Gäste beendet und der Host geht in den Reboot. 

Lösung: In der /etc/xen/xend-config.sxp lässt sich unter dom0-min-mem Speicher für die Dom0 reservieren. Ein Neustart von Xen lädt die neue Konfiguration, danach wird der Start der VM kommentarlos unterbunden, für die nicht genug Speicher vorhanden ist. 

Wünschenswerte Verbesserungen: 
dom0-min-mem sollte via ucr konfigurierbar sein und initial auf einen in den meisten Szenarien sinnvollen Wert, etwa 1GB, gesetzt werden. In /etc/xen/xend-config.sxp ist derzeit ein Wert von lediglich 196MB fest gesetzt.

In UVMM sollte eine Warnung ausgegeben werden, wenn der Speicher durch den Start einer VM auszugehen droht. Wenn eine VM aus Speichergründen nicht gestartet werden kann, sollte eine entsprechende Meldung ausgegeben werden.

Das on_poweroff, on_reboot, on_crash Verhalten sollte in UVMM konfigurierbar sein.
Comment 1 Ingo Steuwer univentionstaff 2012-05-02 16:16:29 CEST
Der Bug sollte ggf. in mehrere aufgeteilt werden:

1. evtl. lässt sich das Standardverhalten von XEN beeinflussen (er sollte selber erkennen wenn der Speicher nicht ausreicht und keine Maschinen mehr starten)
2. die Konfiguration des reservierten Speichers über UCR
3. das Anzeigen einer Meldung im UVMM wenn die VM aus Speichermangel nicht gestartet wurde
Comment 2 Philipp Hahn univentionstaff 2013-05-23 15:32:35 CEST
xen3 with only 4 MiB, but 2 VMs with 2+1 GiB + dom0 running:
# xm dmesg
(XEN) System RAM: 3967MB (4062392kB)
# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   804     2     r-----   1162.9
ucs31-64-hvm                                 6  1024     1     r-----    250.4
ucs31-64-pv                                     1024     1                10.5
win7-64-hvm                                  7  2048     1     --p--d      0.0

I can no longer interact with the domUs.
Yesterday the host suddenly rebootet in a probably similar situation.
Today I've got the following OOPs:

[28591.129593] BUG: unable to handle kernel paging request at ffff8800cc86c000
[28591.129597] IP: [<ffffffffa02a06c2>] OUT_RINGp+0x28/0x32 [nouveau]
[28591.129622] PGD 1606067 PUD f0f067 PMD f74067 PTE 0
[28591.129625] Oops: 0000 [#1] SMP 
[28591.129629] CPU 0 
[28591.129630] Modules linked in: loop tun xen_blkback xt_physdev ebtable_nat ebtables xen_netback xen_gntdev nfsd lockd nfs_acl auth_rpcgss sunrpc ip6t_REJECT ipt_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables bridge stp blktap xen_blkfront xenfs xen_evtchn quota_v2 quota_tree snd_hda_codec_realtek snd_hda_intel snd_hda_codec nouveau snd_hwdep ttm drm_kms_helper drm snd_pcm i2c_algo_bit mxm_wmi snd_timer power_supply psmouse edac_core tpm_tis i2c_nforce2 k10temp mperf tpm edac_mce_amd snd pcspkr tpm_bios serio_raw wmi soundcore snd_page_alloc processor i2c_core evdev shpchp ext4 jbd2 crc16 dm_snapshot dm_mirror dm_region_hash dm_log dm_mod sg sd_mod crc_t10dif sr_mod cdrom ata_generic ohci_hcd video thermal_sys pata_amd ehci_hcd ahci libahci forcedeth libata usbcore usb_common button [last unloaded: scsi_wait_scan]
[28591.129667] 
[28591.129670] Pid: 10782, comm: brctl Not tainted 3.2.0-ucs27-amd64 #1 Debian 3.2.39-2.27.201303061719 To Be Filled By O.E.M. To Be Filled By O.E.M./K10N78
[28591.129673] RIP: e030:[<ffffffffa02a06c2>]  [<ffffffffa02a06c2>] OUT_RINGp+0x28/0x32 [nouveau]
[28591.129682] RSP: e02b:ffff880072eff780  EFLAGS: 00010082
[28591.129683] RAX: ffff8800cba08000 RBX: ffff8800cba08000 RCX: 0000000000000010
[28591.129685] RDX: 0000000000000020 RSI: ffff8800cc86c000 RDI: ffffc900110b77f0
[28591.129686] RBP: 0000000000000020 R08: ffffc900110b7780 R09: 00000000000001b8
[28591.129688] R10: ffff880000000041 R11: 0000000000000001 R12: 0000000000000000
[28591.129689] R13: 0000000000000000 R14: ffff8800cc86c010 R15: 00000000000007ff
[28591.129696] FS:  00002abffcdf9b20(0000) GS:ffff8800d7c00000(0000) knlGS:0000000000000000
[28591.129698] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[28591.129699] CR2: ffff8800cc86c000 CR3: 00000000cc00f000 CR4: 0000000000000660
[28591.129701] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[28591.129703] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[28591.129705] Process brctl (pid: 10782, threadinfo ffff880072efe000, task ffff8800cf2bce60)
[28591.129706] Stack:
[28591.129707]  ffffffffa03006aa ffffea000191e210 0000002800000010 0000000000000030
[28591.129710]  ffff8800cc813800 ffff880072eff8a8 ffff8800ce300000 0000000000000000
[28591.129712]  ffff8800cc86bf95 0000000000000005 ffffffffa02b4bc2 0000000000000010
[28591.129715] Call Trace:
[28591.129730]  [<ffffffffa03006aa>] ? nv50_fbcon_imageblit+0x18d/0x1ac [nouveau]
[28591.129739]  [<ffffffffa02b4bc2>] ? nouveau_fbcon_imageblit+0x8b/0xdb [nouveau]
[28591.129745]  [<ffffffff81203ef8>] ? bit_putcs+0x3dd/0x431
[28591.129755]  [<ffffffffa03006b7>] ? nv50_fbcon_imageblit+0x19a/0x1ac [nouveau]
[28591.129758]  [<ffffffff81203afd>] ? bit_cursor+0x46e/0x48c
[28591.129761]  [<ffffffff811feb9d>] ? fbcon_putcs+0xfb/0x10a
[28591.129763]  [<ffffffff81203b1b>] ? bit_cursor+0x48c/0x48c
[28591.129765]  [<ffffffff81200c2f>] ? fbcon_redraw+0xcc/0x161
[28591.129767]  [<ffffffff81201303>] ? fbcon_scroll+0x63f/0xbe7
[28591.129771]  [<ffffffff81255ca7>] ? scrup+0x67/0xd4
[28591.129773]  [<ffffffff81255e36>] ? lf+0x25/0x58
[28591.129775]  [<ffffffff81256093>] ? vt_console_print+0x1c8/0x2cb
[28591.129780]  [<ffffffff81049b0e>] ? __call_console_drivers+0x75/0x86
[28591.129782]  [<ffffffff8104a192>] ? console_unlock+0x123/0x20b
[28591.129786]  [<ffffffff813817ed>] ? _raw_spin_unlock_irqrestore+0x10/0x11
[28591.129788]  [<ffffffff8104a852>] ? vprintk+0x39e/0x3e5
[28591.129793]  [<ffffffff8137fb94>] ? printk+0x40/0x4c
[28591.129797]  [<ffffffff810bd3ff>] ? find_get_page+0x42/0x64
[28591.129800]  [<ffffffff810046e2>] ? pte_pfn_to_mfn+0x23/0x74
[28591.129804]  [<ffffffff812c01b2>] ? __dev_set_promiscuity+0xc5/0x167
[28591.129806]  [<ffffffff812c03d7>] ? dev_set_promiscuity+0x14/0x3a
[28591.129814]  [<ffffffffa038e06f>] ? br_add_if+0x1c7/0x3b6 [bridge]
[28591.129817]  [<ffffffff812c2fe6>] ? dev_ifsioc+0x2fa/0x31a
[28591.129819]  [<ffffffff812c2257>] ? dev_get_by_name_rcu+0x31/0x3d
[28591.129822]  [<ffffffff812c3b12>] ? dev_ioctl+0x54f/0x62e
[28591.129824]  [<ffffffff810048ed>] ? pte_mfn_to_pfn+0x17/0x42
[28591.129826]  [<ffffffff81004531>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[28591.129830]  [<ffffffff812b0560>] ? sock_do_ioctl+0x2f/0x36
[28591.129831]  [<ffffffff812b0967>] ? sock_ioctl+0x205/0x212
[28591.129836]  [<ffffffff810f64c1>] ? kmem_cache_free+0x2c/0x62
[28591.129839]  [<ffffffff811142d7>] ? do_vfs_ioctl+0x464/0x4b1
[28591.129842]  [<ffffffff81118534>] ? dput+0xe6/0xf2
[28591.129845]  [<ffffffff81107f10>] ? fput+0x17a/0x1a2
[28591.129847]  [<ffffffff8111436f>] ? sys_ioctl+0x4b/0x70
[28591.129850]  [<ffffffff81104eb2>] ? filp_close+0x64/0x6c
[28591.129854]  [<ffffffff81386e12>] ? system_call_fastpath+0x16/0x1b
[28591.129855] Code: 41 5f c3 4c 63 87 70 03 00 00 48 8b 8f d8 00 00 00 48 89 f8 49 c1 e0 02 4c 03 81 b8 01 00 00 8d 0c 95 00 00 00 00 89 c9 4c 89 c7 <f3> a4 01 90 70 03 00 00 c3 90 48 89 f8 eb 08 39 30 74 0d 48 83 
[28591.129869] RIP  [<ffffffffa02a06c2>] OUT_RINGp+0x28/0x32 [nouveau]
[28591.129879]  RSP <ffff880072eff780>
[28591.129880] CR2: ffff8800cc86c000
[28591.134170] ---[ end trace 492aa3ea0023e026 ]---

Might be related to Bug #24039
Comment 3 Stefan Gohmann univentionstaff 2016-04-25 07:51:59 CEST
This issue has been filed against UCS 2.4.

UCS 2.4 is out of maintenance and many UCS components have vastly changed in
later releases. Thus, this issue is now being closed.

If this issue still occurs in newer UCS versions, please use "Clone this bug".
In this case please provide detailed information on how this issue is affecting
you.