[Gluster-users] Problem with glusterd locks on gluster 3.6.1

Fri Jun 17 09:25:46 UTC 2016

Thanks Atin, I had three merge conflicts in the third patch.. I've attached
the files with the conflicts. Would any of the intervening commits be
needed as well?

The conflicts were in :

    both modified:      libglusterfs/src/mem-types.h
    both modified:      xlators/mgmt/glusterd/src/glusterd-utils.c
    both modified:      xlators/mgmt/glusterd/src/glusterd-utils.h

On Fri, Jun 17, 2016 at 2:17 PM, Atin Mukherjee <amukherj at redhat.com> wrote:

>
>
> On 06/17/2016 12:44 PM, B.K.Raghuram wrote:
> > Thanks Atin.. I'm not familiar with pulling patches the review system
> > but will try:)
>
> It's not that difficult. Open the gerrit review link, go to the download
> drop box at the top right corner, click on it and then you will see a
> cherry pick option, copy that content and paste it the source code repo
> you host. If there are no merge conflicts, it should auto apply,
> otherwise you'd need to fix them manually.
>
> HTH.
> Atin
>
> >
> > On Fri, Jun 17, 2016 at 12:35 PM, Atin Mukherjee <amukherj at redhat.com
> > <mailto:amukherj at redhat.com>> wrote:
> >
> >
> >
> >     On 06/16/2016 06:17 PM, Atin Mukherjee wrote:
> >     >
> >     >
> >     > On 06/16/2016 01:32 PM, B.K.Raghuram wrote:
> >     >> Thanks a lot Atin,
> >     >>
> >     >> The problem is that we are using a forked version of 3.6.1 which
> has
> >     >> been modified to work with ZFS (for snapshots) but we do not have
> the
> >     >> resources to port that over to the later versions of gluster.
> >     >>
> >     >> Would you know of anyone who would be willing to take this on?!
> >     >
> >     > If you can cherry pick the patches and apply them on your source
> and
> >     > rebuild it, I can point the patches to you, but you'd need to give
> a
> >     > day's time to me as I have some other items to finish from my
> plate.
> >
> >
> >     Here is the list of the patches need to be applied on the following
> >     order:
> >
> >     http://review.gluster.org/9328
> >     http://review.gluster.org/9393
> >     http://review.gluster.org/10023
> >
> >     >
> >     > ~Atin
> >     >>
> >     >> Regards,
> >     >> -Ram
> >     >>
> >     >> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee
> >     <amukherj at redhat.com <mailto:amukherj at redhat.com>
> >     >> <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> wrote:
> >     >>
> >     >>
> >     >>
> >     >>     On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
> >     >>     >
> >     >>     >
> >     >>     > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee
> >     <amukherj at redhat.com <mailto:amukherj at redhat.com>
> >     <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>
> >     >>     > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>
> >     <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>>> wrote:
> >     >>     >
> >     >>     >
> >     >>     >
> >     >>     >     On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
> >     >>     >     > Hi,
> >     >>     >     >
> >     >>     >     > We're using gluster 3.6.1 and we periodically find
> >     that gluster commands
> >     >>     >     > fail saying the it could not get the lock on one of
> >     the brick machines.
> >     >>     >     > The logs on that machine then say something like :
> >     >>     >     >
> >     >>     >     > [2016-06-15 08:17:03.076119] E
> >     >>     >     > [glusterd-op-sm.c:3058:glusterd_op_ac_lock]
> >     0-management: Unable to
> >     >>     >     > acquire lock for vol2
> >     >>     >
> >     >>     >     This is a possible case if concurrent volume operations
> >     are run. Do you
> >     >>     >     have any script which checks for volume status on an
> >     interval from all
> >     >>     >     the nodes, if so then this is an expected behavior.
> >     >>     >
> >     >>     >
> >     >>     > Yes, I do have a couple of scripts that check on volume and
> >     quota
> >     >>     > status.. Given this, I do get a "Another transaction is in
> >     progress.."
> >     >>     > message which is ok. The problem is that sometimes I get
> >     the volume lock
> >     >>     > held message which never goes away. This sometimes results
> >     in glusterd
> >     >>     > consuming a lot of memory and CPU and the problem can only
> >     be fixed with
> >     >>     > a reboot. The log files are huge so I'm not sure if its ok
> >     to attach
> >     >>     > them to an email.
> >     >>
> >     >>     Ok, so this is known. We have fixed lots of stale lock issues
> >     in 3.7
> >     >>     branch and some of them if not all were also backported to
> >     3.6 branch.
> >     >>     The issue is you are using 3.6.1 which is quite old. If you
> >     can upgrade
> >     >>     to latest versions of 3.7 or at worst of 3.6 I am confident
> >     that this
> >     >>     will go away.
> >     >>
> >     >>     ~Atin
> >     >>     >
> >     >>     >     >
> >     >>     >     > After sometime, glusterd then seems to give up and
> die..
> >     >>     >
> >     >>     >     Do you mean glusterd shuts down or segfaults, if so I
> >     am more
> >     >>     interested
> >     >>     >     in analyzing this part. Could you provide us the
> >     glusterd log,
> >     >>     >     cmd_history log file along with core (in case of SEGV)
> from
> >     >>     all the
> >     >>     >     nodes for the further analysis?
> >     >>     >
> >     >>     >
> >     >>     > There is no segfault. glusterd just shuts down. As I said
> >     above,
> >     >>     > sometimes this happens and sometimes it just continues to
> >     hog a lot of
> >     >>     > memory and CPU..
> >     >>     >
> >     >>     >
> >     >>     >     >
> >     >>     >     > Interestingly, I also find the following line in the
> >     >>     beginning of
> >     >>     >     > etc-glusterfs-glusterd.vol.log and I dont know if
> >     this has any
> >     >>     >     > significance to the issue :
> >     >>     >     >
> >     >>     >     > [2016-06-14 06:48:57.282290] I
> >     >>     >     > [glusterd-store.c:2063:glusterd_restore_op_version]
> >     >>     0-management:
> >     >>     >     > Detected new install. Setting op-version to maximum :
> >     30600
> >     >>     >     >
> >     >>     >
> >     >>     >
> >     >>     > What does this line signify?
> >     >>
> >     >>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mem-types.h
Type: text/x-chdr
Size: 6082 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glusterd-utils.c
Type: text/x-csrc
Size: 472427 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glusterd-utils.h
Type: text/x-chdr
Size: 28526 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/72c7f905/attachment-0005.bin>