[Bugs] [Bug 1352632] New: qemu libgfapi clients hang when doing I/O

Mon Jul 4 13:23:08 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1352632

            Bug ID: 1352632
           Summary: qemu libgfapi clients hang when doing I/O
           Product: GlusterFS
           Version: 3.8.0
         Component: libgfapi
          Keywords: Triaged
          Assignee: bugs at gluster.org
          Reporter: rtalur at redhat.com
        QA Contact: sdharane at redhat.com
                CC: bugs at gluster.org, kaushal at redhat.com,
                    lindsay.mathieson at gmail.com, ndevos at redhat.com,
                    pgurusid at redhat.com, rtalur at redhat.com,
                    sdharane at redhat.com
        Depends On: 1352482
            Blocks: 1350804 (glusterfs-3.7.13)

+++ This bug was initially created as a clone of Bug #1352482 +++

qemu and related tools (qemu-img) hang when using libgfapi from
glusterfs-3.7.12.

For eg., running the following qemu-img command against a single brick
glusterfs-3.7.12 volume, causes the qemu-img command to hang,

# qemu-img create -f qcow2 gluster://localhost/testvol/testimg.qcow2 10G

With qemu-img at least the hangs happen when creating qcow2 images. The command
doesn't hang when creating raw images.

When creating a qcow2 image, the qemu-img appears to be reloading the glusterfs
graph several times. This can be observed in the attached log where qemu-img is
run against glusterfs-3.7.11.

With glusterfs-3.7.12, this doesn't happen as an early writev failure happens
on the brick transport with a EFAULT (Bad address) errno (see attached log). No
further actions happen after this, and the qemu-img command hangs till the RPC
ping-timeout happens and then fails.

Investigation is still on to find out the cause for this error.

This issue was originally reported in the gluster-users mailing list by Lindsay
Mathieson, Kevin Lemonnier and Dmitry Melekhov. [1][2][3]

[1] https://www.gluster.org/pipermail/gluster-users/2016-June/027144.html
[2] https://www.gluster.org/pipermail/gluster-users/2016-June/027186.html
[3] https://www.gluster.org/pipermail/gluster-users/2016-July/027218.html

--- Additional comment from Kaushal on 2016-07-04 14:56 IST ---

--- Additional comment from Kaushal on 2016-07-04 14:58 IST ---

--- Additional comment from Niels de Vos on 2016-07-04 16:28 IST ---

--- Additional comment from Niels de Vos on 2016-07-04 16:28 IST ---

--- Additional comment from Niels de Vos on 2016-07-04 16:36:23 IST ---

The image is actually created, even tough this error was reported:

qemu-img: gluster://localhost/vms/qcow2.img: Could not resize image:
Input/output error

[root at vm017 ~]# qemu-img info gluster://localhost/vms/qcow2.img 
image: gluster://localhost/vms/qcow2.img
file format: qcow2
virtual size: 0 (0 bytes)
disk size: 193K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false

[root at vm017 ~]# qemu-img info gluster://localhost/vms/raw.img 
image: gluster://localhost/vms/raw.img
file format: raw
virtual size: 32M (33554432 bytes)
disk size: 4.0K

There are no errors in the tcpdump that I could spot in a glance.

--- Additional comment from Niels de Vos on 2016-07-04 17:03 IST ---

--- Additional comment from Niels de Vos on 2016-07-04 17:04 IST ---

--- Additional comment from Poornima G on 2016-07-04 18:25:09 IST ---

RCA:

Debugged this along with Raghavendra Talur and Kaushal M. turns out this is
caused by http://review.gluster.org/#/c/14148/ .

pub_glfs_pwritev_async(..., iovec, iovec_count...) can take array of iovecs as
input and another parameter count that indicates the number of iovecs passed.
gfapi internally collates all the iovecs into a single iovec and sends it all
the way to the RPC(network layer), as a result of collating all the iovecs, the
count of iovecs should also be passed as '1', but the patch was sending the
count as sent by the user. i.e. if user specified 3 iovecs, and count is 3,
gfapi copies all iovecs into one and should send the count as 1, but it is
currently sending as 3, and hence the issue.

The fix for the same will be sent, and will try to include it in 3.7.13.

Regards,
Poornima

--- Additional comment from Vijay Bellur on 2016-07-04 18:39:11 IST ---

REVIEW: http://review.gluster.org/14854 (gfapi: update count when glfs_buf_copy
is used) posted (#1) for review on master by Raghavendra Talur
(rtalur at redhat.com)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1350804
[Bug 1350804] Tracker bug for GlusterFS-v3.7.13
https://bugzilla.redhat.com/show_bug.cgi?id=1352482
[Bug 1352482] qemu libgfapi clients hang when doing I/O with 3.7.12
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.