[Gluster-users] QEMU gfapi segfault

Tue Dec 30 22:41:13 UTC 2014

Hey folks, 

I'm working on tracking down rogue QEMU segfaults in my infrastructure that look to be dying due to gluster. The tips that I get is that the process is in disk sleep when it dies and the process is backed only by gluster and the segfault lends to io system issues. Unfortunately I haven't figured out how to get a full crash dump so I can run it through apport-retrace to get exactly what went wrong. The other interesting thing is this happens only when gluster is under heavy load. Any tips about debugging further or getting this fixed up would be appreciated. 

Segfault: 

Dec 30 20:42:56 HFMHVR3 kernel: [5976247.820875] qemu-system-x86[27730]: segfault at 128 ip 00007f891f0cc82c sp 00007f89376846a0 error 4 in qemu-system-x86_64 (deleted)[7f891ed42000+4af000] 

Brick log: 

[2014-12-30 20:42:56.797946] I [server.c:520:server_rpc_notify] 0-VMARRAY-server: disconnecting connectionfrom HFMHVR3-27726-2014/11/29-00:42:11:436294-VMARRAY-client-0-0-0 
[2014-12-30 20:42:56.798244] W [inodelk.c:392:pl_inodelk_log_cleanup] 0-VMARRAY-server: releasing lock on 6e640448-aa4c-4faa-b7ad-33e68aca0d3a held by {client=0x7fe130776740, pid=0 lk-owner=ecb80 
[2014-12-30 20:42:56.798287] I [server-helpers.c:289:do_fd_cleanup] 0-VMARRAY-server: fd cleanup on /HFMPCI0.img 
[2014-12-30 20:42:56.798384] I [client_t.c:417:gf_client_unref] 0-VMARRAY-server: Shutting down connection HFMHVR3-27726-2014/11/29-00:42:11:436294-VMARRAY-client-0-0-0 

Nothing interesting in the VM log or around the segfault event in the hypervisor log 

Enviroment 

Ubuntu 14.04 running stock QEMU 2.0.0 only modified for gfapi from https://launchpad.net/~josh-boon/+archive/ubuntu/qemu-glusterfs running on top of an SSD RAID0 array. The gluster volumes are connected over back-to-back 10G fiber connects running in a bond using balance-rr. 

Config 

Filesystem mount 

/dev/mapper/VG0-VAR on /var type xfs (rw,noatime,nodiratime,nobarrier) 

Gluster config 

Volume Name: VMARRAY 
Type: Replicate 
Volume ID: c0947aea-d07f-4ca0-bfcf-3b1c97cec247 
Status: Started 
Number of Bricks: 1 x 2 = 2 
Transport-type: tcp 
Bricks: 
Brick1: 10.9.1.1:/var/lib/glusterfs 
Brick2: 10.9.1.2:/var/lib/glusterfs 
Options Reconfigured: 
cluster.choose-local: true 
storage.owner-gid: 112 
storage.owner-uid: 107 
cluster.server-quorum-type: none 
cluster.quorum-type: none 
network.remote-dio: enable 
cluster.eager-lock: enable 
performance.stat-prefetch: off 
performance.io-cache: off 
performance.read-ahead: off 
performance.quick-read: off 
server.allow-insecure: on 
network.ping-timeout: 7 

Machine Disk XML 

<disk type='network' device='disk'> 

<driver name='qemu' type='raw' cache='none'/> 
<source protocol='gluster' name='VMARRAY/HFMPCI0.img'> 
<host name='10.9.1.2'/> 
</source> 
<target dev='vda' bus='virtio'/> 
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> 
</disk> 

Thanks, 

Josh 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20141230/027616f8/attachment.html>