[Bugs] [Bug 1631247] New: Issue enabling cluster.use-compound-fops with libgfapi application running

Thu Sep 20 09:58:46 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1631247

            Bug ID: 1631247
           Summary: Issue enabling cluster.use-compound-fops with libgfapi
                    application running
           Product: GlusterFS
           Version: 3.12
         Component: libgfapi
          Assignee: bugs at gluster.org
          Reporter: paolo.margara at gmail.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org

Description of problem:

I'm running ovirt with libgfapi enabled with gluster 3.12.13 and when I set
"cluster.use-compound-fops" to "on" every VMs are paused due to a storage IO
error while the file system continue to be accessible through fuse client (only
libgfapi application [qemu] stop working).

Version-Release number of selected component (if applicable): 
* gluster 3.12.13
* qemu 2.10.0-21.el7_5.4.1
* ovirt 4.2.6

How reproducible:
On an ovirt 4.2.6 hc installation configured with libgfapi enabled and gluster
3.12.13 runs:

gluster volume set $vm_images_volume_name cluster.use-compound-fops on

When this command is executed every VMs are paused due to a storage IO error
while the file system continue to be accessible through fuse client (only
libgfapi application stop working). In the qemu log file I could see these
gluster related messages:

2018-09-14T11:49:37.020942Z qemu-kvm: terminating on signal 15 from pid
1513 (/usr/sbin/libvirtd)
2018-09-14T11:49:42.766431Z qemu-kvm: Failed to flush the L2 table
cache: Input/output error
2018-09-14T11:49:44.766853Z qemu-kvm: Failed to flush the refcount block
cache: Input/output error
[2018-09-14 11:49:44.869112] E [MSGID: 108006]
[afr-common.c:5118:__afr_handle_child_down_event]
0-vm-images-repo-demo-replicate-1: All subvolumes are down. Going
offline until atleast one of them comes back up.
[2018-09-14 11:49:44.869284] E [MSGID: 108006]
[afr-common.c:5118:__afr_handle_child_down_event]
0-vm-images-repo-demo-replicate-0: All subvolumes are down. Going
offline until atleast one of them comes back up.
[2018-09-14 11:49:44.869515] E [MSGID: 108006]
[afr-common.c:5118:__afr_handle_child_down_event]
0-vm-images-repo-demo-replicate-2: All subvolumes are down. Going
offline until atleast one of them comes back up.
[2018-09-14 11:49:44.869639] E [MSGID: 108006]
[afr-common.c:5118:__afr_handle_child_down_event]
0-vm-images-repo-demo-replicate-3: All subvolumes are down. Going
offline until atleast one of them comes back up.
[2018-09-14 11:49:44.869823] E [MSGID: 108006]
[afr-common.c:5118:__afr_handle_child_down_event]
0-vm-images-repo-demo-replicate-4: All subvolumes are down. Going
offline until atleast one of them comes back up.
2018-09-14 11:49:45.827+0000: shutting down, reason=destroyed

If I set "cluster.use-compound-fops" to "off" everything restart working
correctly again.

Steps to Reproduce:
1. just set "cluster.use-compound-fops" to "on" on gluster volume that host VMs
images used by qemu with libgfapi

Actual results:
if I set "cluster.use-compound-fops" to "on" every VMs runned by qemu with
libgfapi report that all subvolumes are down

Expected results:
if set "cluster.use-compound-fops" to "on" every VMs should continue to work
correctly

Additional info:
let me know if you need more info/log file to figure out the source of the
problem.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.