[Bugs] [Bug 1329466] New: Gluster brick got inode-locked and freeze the all cluster

bugzilla at redhat.com bugzilla at redhat.com
Fri Apr 22 02:50:14 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1329466

            Bug ID: 1329466
           Summary: Gluster brick got inode-locked and freeze the all
                    cluster
           Product: GlusterFS
           Version: 3.7.10
         Component: glusterd
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: aflyhorse at hotmail.com
                CC: bugs at gluster.org



Created attachment 1149632
  --> https://bugzilla.redhat.com/attachment.cgi?id=1149632&action=edit
gluster volume info

Description of problem:

Node is Distributed-Disperse 2 x (4 + 2) = 12.

When I make parallel writing to the same file on the volume, the brick will
occasionally got locked down and freeze the cluster. Sometimes one of the
peer's OS also become unreachable via ssh (could still reached by ping).

"volume status" reports all bricks are online. (even if it cannot be sshed)

"volume start force" (suggested by <aspandey at redhat.com>) could resume the
cluster, if and only if all peers are reachable via ssh. Otherwise, it reports
operation timed out.

I've discussed this in the mailing list:
http://www.gluster.org/pipermail/gluster-users/2016-April/026122.html

How reproducible:

Occasionally under heavy parallel IO load. Met 4 times in the last month.

Additional info:

Snapshot of inode-lock in statedump:
[xlator.features.locks.mainvol-locks.inode]
path=<gfid:2092ae08-81de-4717-a7d5-6ad955e18b58>/NTD/variants_calling/primary_gvcf/A2612/13.g.vcf
mandatory=0
inodelk-count=2
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1,
owner=dc3dbfac887f0000, client=0x7f649835adb0,
connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, granted at
2016-04-21 11:45:30
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1,
owner=d433bfac887f0000, client=0x7f649835adb0,
connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, blocked at
2016-04-21 11:45:33

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list