[Gluster-users] Stale locks on shards

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Jan 25 08:22:08 UTC 2018


On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen <samppah at neutraali.net>
wrote:

> Pranith Kumar Karampuri kirjoitti 25.01.2018 07:09:
>
>> On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
>> <samppah at neutraali.net> wrote:
>>
>> Hi!
>>>
>>> Thank you very much for your help so far. Could you please tell an
>>> example command how to use aux-gid-mount to remove locks? "gluster
>>> vol clear-locks" seems to mount volume by itself.
>>>
>>
>> You are correct, sorry, this was implemented around 7 years back and I
>> forgot that bit about it :-(. Essentially it becomes a getxattr
>> syscall on the file.
>> Could you give me the clear-locks command you were trying to execute
>> and I can probably convert it to the getfattr command?
>>
>
> I have been testing this in test environment and with command:
> gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c
> kind all inode
>

Could you do strace of glusterd when this happens? It will have a getxattr
with "glusterfs.clrlk" in the key. You need to execute that on the
gfid-aux-mount


>
>
>
>> Best regards,
>>> Samuli Heinonen
>>>
>>> Pranith Kumar Karampuri <mailto:pkarampu at redhat.com>
>>>> 23 January 2018 at 10.30
>>>>
>>>> On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
>>>> <samppah at neutraali.net <mailto:samppah at neutraali.net>> wrote:
>>>>
>>>> Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>>>>
>>>> On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>>>>
>>>> <samppah at neutraali.net <mailto:samppah at neutraali.net>>
>>>> wrote:
>>>>
>>>> Hi again,
>>>>
>>>> here is more information regarding issue described
>>>> earlier
>>>>
>>>> It looks like self healing is stuck. According to
>>>> "heal
>>>> statistics"
>>>> crawl began at Sat Jan 20 12:56:19 2018 and it's still
>>>> going on
>>>> (It's around Sun Jan 21 20:30 when writing this).
>>>> However
>>>> glustershd.log says that last heal was completed at
>>>> "2018-01-20
>>>> 11:00:13.090697" (which is 13:00 UTC+2). Also "heal
>>>> info"
>>>> has been
>>>> running now for over 16 hours without any information.
>>>> In
>>>> statedump
>>>> I can see that storage nodes have locks on files and
>>>> some
>>>> of those
>>>> are blocked. Ie. Here again it says that ovirt8z2 is
>>>> having active
>>>> lock even ovirt8z2 crashed after the lock was
>>>> granted.:
>>>>
>>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>>>> mandatory=0
>>>> inodelk-count=3
>>>>
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>>> start=0,
>>>> len=0, pid
>>>> = 18446744073709551610, owner=d0c6d857a87f0000,
>>>> client=0x7f885845efa0,
>>>>
>>>>
>>>>
>>>>
>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>>
>>>> granted at 2018-01-20 10:59:52
>>>>
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>>> start=0,
>>>> len=0, pid
>>>> = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>>>
>>>> connection-id=ovirt8z2.xxx.com [1]
>>>>
>>>>
>>>>
>>> <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-
>> zone2-ssd1-vmstor1-client-0-7-0,
>>
>>>
>>>> granted at 2018-01-20 08:57:23
>>>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
>>>> start=0,
>>>> len=0,
>>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>>> client=0x7f885845efa0,
>>>>
>>>>
>>>>
>>>>
>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>>
>>>> blocked at 2018-01-20 10:59:52
>>>>
>>>> I'd also like to add that volume had arbiter brick
>>>> before
>>>> crash
>>>> happened. We decided to remove it because we thought
>>>> that
>>>> it was
>>>> causing issues. However now I think that this was
>>>> unnecessary. After
>>>> the crash arbiter logs had lots of messages like this:
>>>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>>>> [server-rpc-fops.c:1640:server_setattr_cbk]
>>>> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>>>> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>>>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
>>>> not
>>>> permitted)
>>>> [Operation not permitted]
>>>>
>>>> Is there anyways to force self heal to stop? Any help
>>>> would be very
>>>> much appreciated :)
>>>>
>>>> Exposing .shard to a normal mount is opening a can of
>>>> worms. You
>>>> should probably look at mounting the volume with gfid
>>>> aux-mount where
>>>> you can access a file with
>>>> <path-to-mount>/.gfid/<gfid-string>to clear
>>>> locks on it.
>>>>
>>>> Mount command:  mount -t glusterfs -o aux-gfid-mount
>>>> vm1:test
>>>> /mnt/testvol
>>>>
>>>> A gfid string will have some hyphens like:
>>>> 11118443-1894-4273-9340-4b212fa1c0e4
>>>>
>>>> That said. Next disconnect on the brick where you
>>>> successfully
>>>> did the
>>>> clear-locks will crash the brick. There was a bug in 3.8.x
>>>> series with
>>>> clear-locks which was fixed in 3.9.0 with a feature. The
>>>> self-heal
>>>> deadlocks that you witnessed also is fixed in 3.10 version
>>>> of the
>>>> release.
>>>>
>>>> Thank you the answer. Could you please tell more about crash?
>>>> What
>>>> will actually happen or is there a bug report about it? Just
>>>> want
>>>> to make sure that we can do everything to secure data on
>>>> bricks.
>>>> We will look into upgrade but we have to make sure that new
>>>> version works for us and of course get self healing working
>>>> before
>>>> doing anything :)
>>>>
>>>> Locks xlator/module maintains a list of locks that are granted to
>>>> a client. Clear locks had an issue where it forgets to remove the
>>>> lock from this list. So the connection list ends up pointing to
>>>> data that is freed in that list after a clear lock. When a
>>>> disconnect happens, all the locks that are granted to a client
>>>> need to be unlocked. So the process starts traversing through this
>>>> list and when it starts trying to access this freed data it leads
>>>> to a crash. I found it while reviewing a feature patch sent by
>>>> facebook folks to locks xlator (http://review.gluster.org/14816
>>>> [2]) for 3.9.0 and they also fixed this bug as well as part of
>>>>
>>>> that feature patch.
>>>>
>>>> Br,
>>>> Samuli
>>>>
>>>> 3.8.x is EOLed, so I recommend you to upgrade to a
>>>> supported
>>>> version
>>>> soon.
>>>>
>>>> Best regards,
>>>> Samuli Heinonen
>>>>
>>>> Samuli Heinonen
>>>> 20 January 2018 at 21.57
>>>>
>>>> Hi all!
>>>>
>>>> One hypervisor on our virtualization environment
>>>> crashed and now
>>>> some of the VM images cannot be accessed. After
>>>> investigation we
>>>> found out that there was lots of images that still
>>>> had
>>>> active lock
>>>> on crashed hypervisor. We were able to remove
>>>> locks
>>>> from "regular
>>>> files", but it doesn't seem possible to remove
>>>> locks
>>>> from shards.
>>>>
>>>> We are running GlusterFS 3.8.15 on all nodes.
>>>>
>>>> Here is part of statedump that shows shard having
>>>> active lock on
>>>> crashed node:
>>>>
>>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>>>
>>>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>>> mandatory=0
>>>> inodelk-count=1
>>>>
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>>>
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>>>
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>>> start=0, len=0,
>>>> pid = 3568, owner=14ce372c397f0000,
>>>> client=0x7f3198388770,
>>>> connection-id
>>>>
>>>>
>>>>
>>>>
>>> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
>> or1-client-1-7-0,
>>
>>>
>>>> granted at 2018-01-20 08:57:24
>>>>
>>>> If we try to run clear-locks we get following
>>>> error
>>>> message:
>>>> # gluster volume clear-locks zone2-ssd1-vmstor1
>>>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>>> kind
>>>> all inode
>>>> Volume clear-locks unsuccessful
>>>> clear-locks getxattr command failed. Reason:
>>>> Operation not
>>>> permitted
>>>>
>>>> Gluster vol info if needed:
>>>> Volume Name: zone2-ssd1-vmstor1
>>>> Type: Replicate
>>>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: rdma
>>>> Bricks:
>>>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>>>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>>>> Options Reconfigured:
>>>> cluster.shd-wait-qlength: 10000
>>>> cluster.shd-max-threads: 8
>>>> cluster.locking-scheme: granular
>>>> performance.low-prio-threads: 32
>>>> cluster.data-self-heal-algorithm: full
>>>> performance.client-io-threads: off
>>>> storage.linux-aio: off
>>>> performance.readdir-ahead: on
>>>> client.event-threads: 16
>>>> server.event-threads: 16
>>>> performance.strict-write-ordering: off
>>>> performance.quick-read: off
>>>> performance.read-ahead: on
>>>> performance.io-cache: off
>>>> performance.stat-prefetch: off
>>>> cluster.eager-lock: enable
>>>> network.remote-dio: on
>>>> cluster.quorum-type: none
>>>> network.ping-timeout: 22
>>>> performance.write-behind: off
>>>> nfs.disable: on
>>>> features.shard: on
>>>> features.shard-block-size: 512MB
>>>> storage.owner-uid: 36
>>>> storage.owner-gid: 36
>>>> performance.io-thread-count: 64
>>>> performance.cache-size: 2048MB
>>>> performance.write-behind-window-size: 256MB
>>>> server.allow-insecure: on
>>>> cluster.ensure-durability: off
>>>> config.transport: rdma
>>>> server.outstanding-rpc-limit: 512
>>>> diagnostics.brick-log-level: INFO
>>>>
>>>> Any recommendations how to advance from here?
>>>>
>>>> Best regards,
>>>> Samuli Heinonen
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> <mailto:Gluster-users at gluster.org>
>>>>
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
>>>> [1]
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> <mailto:Gluster-users at gluster.org>
>>>>
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>>
>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users [3]> [1]
>>>>
>>>> --
>>>>
>>>> Pranith
>>>>
>>>> Links:
>>>> ------
>>>> [1]
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> [3]>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>> Samuli Heinonen <mailto:samppah at neutraali.net>
>>>> 21 January 2018 at 21.03
>>>> Hi again,
>>>>
>>>> here is more information regarding issue described earlier
>>>>
>>>> It looks like self healing is stuck. According to "heal
>>>> statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still
>>>> going on (It's around Sun Jan 21 20:30 when writing this). However
>>>> glustershd.log says that last heal was completed at "2018-01-20
>>>> 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
>>>> running now for over 16 hours without any information. In
>>>> statedump I can see that storage nodes have locks on files and
>>>> some of those are blocked. Ie. Here again it says that ovirt8z2 is
>>>> having active lock even ovirt8z2 crashed after the lock was
>>>> granted.:
>>>>
>>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>>>> mandatory=0
>>>> inodelk-count=3
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
>>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>>> client=0x7f885845efa0,
>>>>
>>>>
>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>> granted at 2018-01-20 10:59:52
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
>>>> pid = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>>> connection-id=ovirt8z2.xxx.com
>>>>
>>>> [1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
>>>
>>>> granted at 2018-01-20 08:57:23
>>>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
>>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>>> client=0x7f885845efa0,
>>>>
>>>>
>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>> blocked at 2018-01-20 10:59:52
>>>>
>>>> I'd also like to add that volume had arbiter brick before crash
>>>> happened. We decided to remove it because we thought that it was
>>>> causing issues. However now I think that this was unnecessary.
>>>> After the crash arbiter logs had lots of messages like this:
>>>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>>>> [server-rpc-fops.c:1640:server_setattr_cbk]
>>>> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>>>> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>>>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
>>>> permitted) [Operation not permitted]
>>>>
>>>> Is there anyways to force self heal to stop? Any help would be
>>>> very much appreciated :)
>>>>
>>>> Best regards,
>>>> Samuli Heinonen
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>>
>>>> Samuli Heinonen <mailto:samppah at neutraali.net>
>>>>
>>>> 20 January 2018 at 21.57
>>>> Hi all!
>>>>
>>>> One hypervisor on our virtualization environment crashed and now
>>>> some of the VM images cannot be accessed. After investigation we
>>>> found out that there was lots of images that still had active lock
>>>> on crashed hypervisor. We were able to remove locks from "regular
>>>> files", but it doesn't seem possible to remove locks from shards.
>>>>
>>>> We are running GlusterFS 3.8.15 on all nodes.
>>>>
>>>> Here is part of statedump that shows shard having active lock on
>>>> crashed node:
>>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>>> mandatory=0
>>>> inodelk-count=1
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
>>>> pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
>>>> connection-id
>>>>
>>>>
>>> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
>> or1-client-1-7-0,
>>
>>> granted at 2018-01-20 08:57:24
>>>>
>>>> If we try to run clear-locks we get following error message:
>>>> # gluster volume clear-locks zone2-ssd1-vmstor1
>>>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
>>>> Volume clear-locks unsuccessful
>>>> clear-locks getxattr command failed. Reason: Operation not
>>>> permitted
>>>>
>>>> Gluster vol info if needed:
>>>> Volume Name: zone2-ssd1-vmstor1
>>>> Type: Replicate
>>>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: rdma
>>>> Bricks:
>>>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>>>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>>>> Options Reconfigured:
>>>> cluster.shd-wait-qlength: 10000
>>>> cluster.shd-max-threads: 8
>>>> cluster.locking-scheme: granular
>>>> performance.low-prio-threads: 32
>>>> cluster.data-self-heal-algorithm: full
>>>> performance.client-io-threads: off
>>>> storage.linux-aio: off
>>>> performance.readdir-ahead: on
>>>> client.event-threads: 16
>>>> server.event-threads: 16
>>>> performance.strict-write-ordering: off
>>>> performance.quick-read: off
>>>> performance.read-ahead: on
>>>> performance.io-cache: off
>>>> performance.stat-prefetch: off
>>>> cluster.eager-lock: enable
>>>> network.remote-dio: on
>>>> cluster.quorum-type: none
>>>> network.ping-timeout: 22
>>>> performance.write-behind: off
>>>> nfs.disable: on
>>>> features.shard: on
>>>> features.shard-block-size: 512MB
>>>> storage.owner-uid: 36
>>>> storage.owner-gid: 36
>>>> performance.io-thread-count: 64
>>>> performance.cache-size: 2048MB
>>>> performance.write-behind-window-size: 256MB
>>>> server.allow-insecure: on
>>>> cluster.ensure-durability: off
>>>> config.transport: rdma
>>>> server.outstanding-rpc-limit: 512
>>>> diagnostics.brick-log-level: INFO
>>>>
>>>> Any recommendations how to advance from here?
>>>>
>>>> Best regards,
>>>> Samuli Heinonen
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>>
>>>
>> --
>>
>> Pranith
>>
>>
>> Links:
>> ------
>> [1] http://ovirt8z2.xxx.com
>> [2] http://review.gluster.org/14816
>> [3] http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180125/a109818e/attachment.html>


More information about the Gluster-users mailing list