[Gluster-users] Problem with glusterd locks on gluster 3.6.1

Thu Jun 16 05:32:21 UTC 2016

On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
> 
> 
> On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee <amukherj at redhat.com
> <mailto:amukherj at redhat.com>> wrote:
> 
> 
> 
>     On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
>     > Hi,
>     >
>     > We're using gluster 3.6.1 and we periodically find that gluster commands
>     > fail saying the it could not get the lock on one of the brick machines.
>     > The logs on that machine then say something like :
>     >
>     > [2016-06-15 08:17:03.076119] E
>     > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: Unable to
>     > acquire lock for vol2
> 
>     This is a possible case if concurrent volume operations are run. Do you
>     have any script which checks for volume status on an interval from all
>     the nodes, if so then this is an expected behavior.
> 
> 
> Yes, I do have a couple of scripts that check on volume and quota
> status.. Given this, I do get a "Another transaction is in progress.."
> message which is ok. The problem is that sometimes I get the volume lock
> held message which never goes away. This sometimes results in glusterd
> consuming a lot of memory and CPU and the problem can only be fixed with
> a reboot. The log files are huge so I'm not sure if its ok to attach
> them to an email.

Ok, so this is known. We have fixed lots of stale lock issues in 3.7
branch and some of them if not all were also backported to 3.6 branch.
The issue is you are using 3.6.1 which is quite old. If you can upgrade
to latest versions of 3.7 or at worst of 3.6 I am confident that this
will go away.

~Atin
> 
>     >
>     > After sometime, glusterd then seems to give up and die..
> 
>     Do you mean glusterd shuts down or segfaults, if so I am more interested
>     in analyzing this part. Could you provide us the glusterd log,
>     cmd_history log file along with core (in case of SEGV) from all the
>     nodes for the further analysis?
> 
> 
> There is no segfault. glusterd just shuts down. As I said above,
> sometimes this happens and sometimes it just continues to hog a lot of
> memory and CPU..
> 
> 
>     >
>     > Interestingly, I also find the following line in the beginning of
>     > etc-glusterfs-glusterd.vol.log and I dont know if this has any
>     > significance to the issue :
>     >
>     > [2016-06-14 06:48:57.282290] I
>     > [glusterd-store.c:2063:glusterd_restore_op_version] 0-management:
>     > Detected new install. Setting op-version to maximum : 30600
>     >
> 
> 
> What does this line signify?