[Gluster-users] QEMU gfapi segfault
Josh Boon
gluster at joshboon.com
Tue Dec 30 22:41:13 UTC 2014
Hey folks,
I'm working on tracking down rogue QEMU segfaults in my infrastructure that look to be dying due to gluster. The tips that I get is that the process is in disk sleep when it dies and the process is backed only by gluster and the segfault lends to io system issues. Unfortunately I haven't figured out how to get a full crash dump so I can run it through apport-retrace to get exactly what went wrong. The other interesting thing is this happens only when gluster is under heavy load. Any tips about debugging further or getting this fixed up would be appreciated.
Segfault:
Dec 30 20:42:56 HFMHVR3 kernel: [5976247.820875] qemu-system-x86[27730]: segfault at 128 ip 00007f891f0cc82c sp 00007f89376846a0 error 4 in qemu-system-x86_64 (deleted)[7f891ed42000+4af000]
Brick log:
[2014-12-30 20:42:56.797946] I [server.c:520:server_rpc_notify] 0-VMARRAY-server: disconnecting connectionfrom HFMHVR3-27726-2014/11/29-00:42:11:436294-VMARRAY-client-0-0-0
[2014-12-30 20:42:56.798244] W [inodelk.c:392:pl_inodelk_log_cleanup] 0-VMARRAY-server: releasing lock on 6e640448-aa4c-4faa-b7ad-33e68aca0d3a held by {client=0x7fe130776740, pid=0 lk-owner=ecb80
[2014-12-30 20:42:56.798287] I [server-helpers.c:289:do_fd_cleanup] 0-VMARRAY-server: fd cleanup on /HFMPCI0.img
[2014-12-30 20:42:56.798384] I [client_t.c:417:gf_client_unref] 0-VMARRAY-server: Shutting down connection HFMHVR3-27726-2014/11/29-00:42:11:436294-VMARRAY-client-0-0-0
Nothing interesting in the VM log or around the segfault event in the hypervisor log
Enviroment
Ubuntu 14.04 running stock QEMU 2.0.0 only modified for gfapi from https://launchpad.net/~josh-boon/+archive/ubuntu/qemu-glusterfs running on top of an SSD RAID0 array. The gluster volumes are connected over back-to-back 10G fiber connects running in a bond using balance-rr.
Config
Filesystem mount
/dev/mapper/VG0-VAR on /var type xfs (rw,noatime,nodiratime,nobarrier)
Gluster config
Volume Name: VMARRAY
Type: Replicate
Volume ID: c0947aea-d07f-4ca0-bfcf-3b1c97cec247
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.9.1.1:/var/lib/glusterfs
Brick2: 10.9.1.2:/var/lib/glusterfs
Options Reconfigured:
cluster.choose-local: true
storage.owner-gid: 112
storage.owner-uid: 107
cluster.server-quorum-type: none
cluster.quorum-type: none
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
server.allow-insecure: on
network.ping-timeout: 7
Machine Disk XML
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source protocol='gluster' name='VMARRAY/HFMPCI0.img'>
<host name='10.9.1.2'/>
</source>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
Thanks,
Josh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20141230/027616f8/attachment.html>
More information about the Gluster-users
mailing list