[Gluster-users] Locking failed - since upgrade to 3.6.4

Mon Aug 3 15:04:41 UTC 2015

-Atin
Sent from one plus one
On Aug 3, 2015 8:31 PM, "Osborne, Paul (paul.osborne at canterbury.ac.uk)" <
paul.osborne at canterbury.ac.uk> wrote:
>
> Hi,
>
>
> OK I have tracked through the logs which of the hosts apparently has a
lock open:
>
>
> [2015-08-03 14:55:37.602717] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume blogs
>
> [2015-08-03 14:51:57.791081] E [glusterd-utils.c:148:glusterd_lock]
0-management: Unable to get lock for uuid:
76e4398c-e00a-4f3b-9206-4f885c4e5206, lock held by:
76e4398c-e00a-4f3b-9206-4f885c4e5206
>
This indicates that cluster is still operating at older op version. You
would need to bump up the op version to 30604 using Gluster volume set all
cluster.op-version 30604
>
> I have identified the UID for each peer via gluster peer status and
working backwards.
>
> I see that gluster volume clear-locks may the locks on the volume - but
is not clear from the logs is what the path is that has the lock or the
kind that is locked.
>
> Incidentally my clients (using NFS) through manual testing appear to
still be able to read/write to the volume - it is the volume status and
heal checks that are failing. All of my clients and servers have been
sequentially rebooted in the hope that this would clear any issue - however
that doe not appear to be the case.
>
>
>
> Thanks
>
> Paul
>
>
>
>
> Paul Osborne
> Senior Systems Engineer
> Canterbury Christ Church University
> Tel: 01227 782751
>
>
> ________________________________
> From: Atin Mukherjee <atin.mukherjee83 at gmail.com>
> Sent: 03 August 2015 15:22
> To: Osborne, Paul (paul.osborne at canterbury.ac.uk)
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Locking failed - since upgrade to 3.6.4
>
>
> Could you check the glusterd log at the other nodes, that would give you
the hint of the exact issue. Also looking at .cmd_log_history will give you
the time interval at which volume status commands are executed. If the gap
is in milisecs then you are bound to hit it and its expected.
>
> -Atin
> Sent from one plus one
>
> On Aug 3, 2015 7:32 PM, "Osborne, Paul (paul.osborne at canterbury.ac.uk)" <
paul.osborne at canterbury.ac.uk> wrote:
>>
>>
>> Hi,
>>
>> Last week I upgraded one of my gluster clusters (3 hosts with bricks as
replica 3) to 3.6.4 from 3.5.4 and all seemed well.
>>
>> Today I am getting reports that locking has failed:
>>
>>
>> gfse-cant-01:/var/log/glusterfs# gluster volume status
>> Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please check log
file for details.
>> Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please check log
file for details.
>>
>> Logs:
>> [2015-08-03 13:45:29.974560] E
[glusterd-syncop.c:1640:gd_sync_task_begin] 0-management: Locking Peers
Failed.
>> [2015-08-03 13:49:48.273159] E [glusterd-syncop.c:105:gd_collate_errors]
0-: Locking failed on gfse-rh-01.core.canterbury.ac.uk. Please ch
>> eck log file for details.
>> [2015-08-03 13:49:48.273778] E [glusterd-syncop.c:105:gd_collate_errors]
0-: Locking failed on gfse-isr-01.core.canterbury.ac.uk. Please c
>> heck log file for details.
>>
>>
>> I am wondering if this is a new feature due to 3.6.4 or something that
has gone wrong.
>>
>> Restarting gluster entirely (btw the restart script does not actually
appear to kill the processes...) resolves the issue but then it repeats a
few minutes later which is rather suboptimal for a running service.
>>
>> Googling suggests that there may be simultaneous actions going on that
can cause a locking issue.
>>
>> I know that I have nagios running volume status <volname> for each of my
volumes on each host every few minutes however this is not new and has been
in place for the last 8-9 months that against 3.5 without issue so would
hope that this is not causing the issue.
>>
>> I am not sure where to look now tbh.
>>
>>
>>
>>
>> Paul Osborne
>> Senior Systems Engineer
>> Canterbury Christ Church University
>> Tel: 01227 782751
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150803/e0b74485/attachment.html>