[Gluster-users] Problem with glusterd locks on gluster 3.6.1

Fri Jun 17 11:49:04 UTC 2016

Have you offered those patches upstream?

On June 16, 2016 1:02:24 AM PDT, "B.K.Raghuram" <bkrram at gmail.com> wrote:
>Thanks a lot Atin,
>
>The problem is that we are using a forked version of 3.6.1 which has
>been
>modified to work with ZFS (for snapshots) but we do not have the
>resources
>to port that over to the later versions of gluster.
>
>Would you know of anyone who would be willing to take this on?!
>
>Regards,
>-Ram
>
>On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee <amukherj at redhat.com>
>wrote:
>
>>
>>
>> On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
>> >
>> >
>> > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee
><amukherj at redhat.com
>> > <mailto:amukherj at redhat.com>> wrote:
>> >
>> >
>> >
>> >     On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
>> >     > Hi,
>> >     >
>> >     > We're using gluster 3.6.1 and we periodically find that
>gluster
>> commands
>> >     > fail saying the it could not get the lock on one of the brick
>> machines.
>> >     > The logs on that machine then say something like :
>> >     >
>> >     > [2016-06-15 08:17:03.076119] E
>> >     > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management:
>Unable to
>> >     > acquire lock for vol2
>> >
>> >     This is a possible case if concurrent volume operations are
>run. Do
>> you
>> >     have any script which checks for volume status on an interval
>from
>> all
>> >     the nodes, if so then this is an expected behavior.
>> >
>> >
>> > Yes, I do have a couple of scripts that check on volume and quota
>> > status.. Given this, I do get a "Another transaction is in
>progress.."
>> > message which is ok. The problem is that sometimes I get the volume
>lock
>> > held message which never goes away. This sometimes results in
>glusterd
>> > consuming a lot of memory and CPU and the problem can only be fixed
>with
>> > a reboot. The log files are huge so I'm not sure if its ok to
>attach
>> > them to an email.
>>
>> Ok, so this is known. We have fixed lots of stale lock issues in 3.7
>> branch and some of them if not all were also backported to 3.6
>branch.
>> The issue is you are using 3.6.1 which is quite old. If you can
>upgrade
>> to latest versions of 3.7 or at worst of 3.6 I am confident that this
>> will go away.
>>
>> ~Atin
>> >
>> >     >
>> >     > After sometime, glusterd then seems to give up and die..
>> >
>> >     Do you mean glusterd shuts down or segfaults, if so I am more
>> interested
>> >     in analyzing this part. Could you provide us the glusterd log,
>> >     cmd_history log file along with core (in case of SEGV) from all
>the
>> >     nodes for the further analysis?
>> >
>> >
>> > There is no segfault. glusterd just shuts down. As I said above,
>> > sometimes this happens and sometimes it just continues to hog a lot
>of
>> > memory and CPU..
>> >
>> >
>> >     >
>> >     > Interestingly, I also find the following line in the
>beginning of
>> >     > etc-glusterfs-glusterd.vol.log and I dont know if this has
>any
>> >     > significance to the issue :
>> >     >
>> >     > [2016-06-14 06:48:57.282290] I
>> >     > [glusterd-store.c:2063:glusterd_restore_op_version]
>0-management:
>> >     > Detected new install. Setting op-version to maximum : 30600
>> >     >
>> >
>> >
>> > What does this line signify?
>>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/20d5410c/attachment.html>