[Bugs] [Bug 1182458] New: glusterd: remote locking failure when multiple synctask transactions are run

bugzilla at redhat.com bugzilla at redhat.com
Thu Jan 15 07:31:40 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1182458

            Bug ID: 1182458
           Summary: glusterd: remote locking failure when multiple
                    synctask transactions are run
           Product: Red Hat Storage
           Version: 3.0
         Component: glusterfs
     Sub Component: core
          Keywords: Triaged
          Assignee: vbellur at redhat.com
          Reporter: amukherj at redhat.com
        QA Contact: sdharane at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com
        Depends On: 1173414
            Blocks: 1176756



+++ This bug was initially created as a clone of Bug #1173414 +++

Description of problem:

When two volume set operations are run in two different volumes simultaneously
in a loop some volume set transactions fail with a remote lock failure. 

Version-Release number of selected component (if applicable):
Mainline

How reproducible:
Always

Steps to Reproduce:
1. Setup a 2 node cluster
2. Create two volumes say vol1 & vol2  & start them
3. Run following script from any one of the node in the cluster
for i in {1..10} 
do
gluster v set vol1 diagnostics.client-log-level DEBUG &
gluster v set vol2 features.barrier on
done

Actual results:
Some of the transaction fails saying "Locking failed in <Peer node>, Please
check log file for details"

Expected results:
Local locking might fail, but remote locking should never fail here.

Additional info:

--- Additional comment from Anand Avati on 2014-12-12 00:50:13 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction
xaction_peers list in syncop) posted (#1) for review on master by Atin
Mukherjee (amukherj at redhat.com)

--- Additional comment from Anand Avati on 2014-12-16 07:05:30 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction
xaction_peers list in syncop & mgmt_v3) posted (#2) for review on master by
Atin Mukherjee (amukherj at redhat.com)

--- Additional comment from Anand Avati on 2014-12-17 01:52:55 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction
xaction_peers list in syncop & mgmt_v3) posted (#3) for review on master by
Atin Mukherjee (amukherj at redhat.com)

--- Additional comment from Anand Avati on 2014-12-22 02:00:50 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction
xaction_peers list in syncop & mgmt_v3) posted (#4) for review on master by
Atin Mukherjee (amukherj at redhat.com)

--- Additional comment from Anand Avati on 2014-12-22 03:39:26 EST ---

REVIEW: http://review.gluster.org/9269 (glusterd: Maintain  per transaction
xaction_peers list in syncop & mgmt_v3) posted (#5) for review on master by
Atin Mukherjee (amukherj at redhat.com)

--- Additional comment from Anand Avati on 2014-12-22 23:14:19 EST ---

COMMIT: http://review.gluster.org/9269 committed in master by Kaushal M
(kaushal at redhat.com) 
------
commit da9deb54df91dedc51ebe165f3a0be646455cb5b
Author: Atin Mukherjee <amukherj at redhat.com>
Date:   Fri Dec 12 07:21:19 2014 +0530

    glusterd: Maintain  per transaction xaction_peers list in syncop & mgmt_v3

    In current implementation xaction_peers list is maintained in a global
variable
    (glustrd_priv_t) for syncop/mgmt_v3. This means consistency and atomicity
of
    peerinfo list across transactions is not guranteed when multiple
syncop/mgmt_v3
    transaction are going through.

    We had got into a problem in mgmt_v3-locks.t which was failing spuriously,
the
    reason for that was two volume set operations (in two different volume) was
    going through simultaneouly and both of these transaction were manipulating
the
    same xaction_peers structure which lead to a corrupted list. Because of
which in
    some cases unlock request to peer was never triggered and we end up with
having
    stale locks.

    Solution is to maintain a per transaction local xaction_peers list for
every
    syncop.

    Please note I've identified this problem in op-sm area as well and a
separate
    patch will be attempted to fix it.

    Finally thanks to Krishnan Parthasarathi and Kaushal M for your constant
help to
    get to the root cause.

    Change-Id: Ib1eaac9e5c8fc319f4e7f8d2ad965bc1357a7c63
    BUG: 1173414
    Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
    Reviewed-on: http://review.gluster.org/9269
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Kaushal M <kaushal at redhat.com>

--- Additional comment from Anand Avati on 2014-12-26 01:52:15 EST ---

REVIEW: http://review.gluster.org/9350 (glusterd: cluster qourum count check
correction) posted (#1) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 01:55:16 EST ---

REVIEW: http://review.gluster.org/9350 (glusterd: cluster qourum count check
correction) posted (#2) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 01:58:50 EST ---

REVIEW: http://review.gluster.org/9350 (glusterd: cluster qourum count check
correction) posted (#3) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 03:21:28 EST ---

REVIEW: http://review.gluster.org/9350 (glusterd: cluster quorum count check
correction) posted (#4) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Anand Avati on 2014-12-29 00:45:10 EST ---

REVIEW: http://review.gluster.org/9350 (glusterd: cluster quorum count check
correction) posted (#5) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Anand Avati on 2015-01-06 00:16:48 EST ---

REVIEW: http://review.gluster.org/9350 (glusterd: cluster quorum count check
correction) posted (#6) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Anand Avati on 2015-01-06 08:44:48 EST ---

COMMIT: http://review.gluster.org/9350 committed in master by Kaushal M
(kaushal at redhat.com) 
------
commit 6e2318f0821d7c58eddc837b2d218247243a5c8d
Author: Atin Mukherjee <amukherj at redhat.com>
Date:   Fri Dec 26 12:18:31 2014 +0530

    glusterd: cluster quorum count check correction

    Due to the recent change introduced by commit
    da9deb54df91dedc51ebe165f3a0be646455cb5b cluster quorum count calucation
now
    depends on whether the peer list is either all peers or global transaction
peer
    list or the local transaction peer list.

    Change-Id: I9f63af9a0cb3cfd6369b050247d0ef3ac93d760f
    BUG: 1173414
    Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
    Reviewed-on: http://review.gluster.org/9350
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas at redhat.com>
    Reviewed-by: Raghavendra Bhat <raghavendra at redhat.com>
    Reviewed-by: Avra Sengupta <asengupt at redhat.com>
    Reviewed-by: Kaushal M <kaushal at redhat.com>

--- Additional comment from Anand Avati on 2015-01-08 03:37:22 EST ---

REVIEW: http://review.gluster.org/9416 (glusterd: use list_for_each_entry_safe
for cleanup) posted (#1) for review on master by Avra Sengupta
(asengupt at redhat.com)

--- Additional comment from Anand Avati on 2015-01-08 11:49:59 EST ---

COMMIT: http://review.gluster.org/9416 committed in master by Krishnan
Parthasarathi (kparthas at redhat.com) 
------
commit 05d3dfb9623f0939fa807cce3b9336a09fadab2a
Author: Avra Sengupta <asengupt at redhat.com>
Date:   Thu Jan 8 08:35:33 2015 +0000

    glusterd: use list_for_each_entry_safe for cleanup

    Use list_for_each_entry_safe() instead of
    list_for_each_entry() for cleanup of local
    xaction_peers list.

    Change-Id: I6d70c04dfb90cbbcd8d9fc4155b8e5e7d7612460
    BUG: 1173414
    Signed-off-by: Avra Sengupta <asengupt at redhat.com>
    Reviewed-on: http://review.gluster.org/9416
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas at redhat.com>
    Tested-by: Krishnan Parthasarathi <kparthas at redhat.com>

--- Additional comment from Anand Avati on 2015-01-08 23:46:41 EST ---

REVIEW: http://review.gluster.org/9422 (glusterd: quorum calculation should
happen on global peer_list) posted (#1) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Anand Avati on 2015-01-09 04:56:06 EST ---

REVIEW: http://review.gluster.org/9422 (glusterd: quorum calculation should
happen on global peer_list) posted (#2) for review on master by Atin Mukherjee
(amukherj at redhat.com)


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1173414
[Bug 1173414] glusterd: remote locking failure when multiple synctask
transactions are run
https://bugzilla.redhat.com/show_bug.cgi?id=1176756
[Bug 1176756] glusterd: remote locking failure when multiple synctask
transactions are run
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=WTqjmAKhPF&a=cc_unsubscribe


More information about the Bugs mailing list