[Gluster-users] Stale locks on shards

Mon Jan 29 00:24:16 UTC 2018

Hi,
     Did you find the command from strace?

On 25 Jan 2018 1:52 pm, "Pranith Kumar Karampuri" <pkarampu at redhat.com>
wrote:

>
>
> On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen <samppah at neutraali.net>
> wrote:
>
>> Pranith Kumar Karampuri kirjoitti 25.01.2018 07:09:
>>
>>> On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
>>> <samppah at neutraali.net> wrote:
>>>
>>> Hi!
>>>>
>>>> Thank you very much for your help so far. Could you please tell an
>>>> example command how to use aux-gid-mount to remove locks? "gluster
>>>> vol clear-locks" seems to mount volume by itself.
>>>>
>>>
>>> You are correct, sorry, this was implemented around 7 years back and I
>>> forgot that bit about it :-(. Essentially it becomes a getxattr
>>> syscall on the file.
>>> Could you give me the clear-locks command you were trying to execute
>>> and I can probably convert it to the getfattr command?
>>>
>>
>> I have been testing this in test environment and with command:
>> gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c
>> kind all inode
>>
>
> Could you do strace of glusterd when this happens? It will have a getxattr
> with "glusterfs.clrlk" in the key. You need to execute that on the
> gfid-aux-mount
>
>
>>
>>
>>
>>> Best regards,
>>>> Samuli Heinonen
>>>>
>>>> Pranith Kumar Karampuri <mailto:pkarampu at redhat.com>
>>>>> 23 January 2018 at 10.30
>>>>>
>>>>> On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
>>>>> <samppah at neutraali.net <mailto:samppah at neutraali.net>> wrote:
>>>>>
>>>>> Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>>>>>
>>>>> On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>>>>>
>>>>> <samppah at neutraali.net <mailto:samppah at neutraali.net>>
>>>>> wrote:
>>>>>
>>>>> Hi again,
>>>>>
>>>>> here is more information regarding issue described
>>>>> earlier
>>>>>
>>>>> It looks like self healing is stuck. According to
>>>>> "heal
>>>>> statistics"
>>>>> crawl began at Sat Jan 20 12:56:19 2018 and it's still
>>>>> going on
>>>>> (It's around Sun Jan 21 20:30 when writing this).
>>>>> However
>>>>> glustershd.log says that last heal was completed at
>>>>> "2018-01-20
>>>>> 11:00:13.090697" (which is 13:00 UTC+2). Also "heal
>>>>> info"
>>>>> has been
>>>>> running now for over 16 hours without any information.
>>>>> In
>>>>> statedump
>>>>> I can see that storage nodes have locks on files and
>>>>> some
>>>>> of those
>>>>> are blocked. Ie. Here again it says that ovirt8z2 is
>>>>> having active
>>>>> lock even ovirt8z2 crashed after the lock was
>>>>> granted.:
>>>>>
>>>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>>>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>>>>> mandatory=0
>>>>> inodelk-count=3
>>>>>
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>>>> start=0,
>>>>> len=0, pid
>>>>> = 18446744073709551610, owner=d0c6d857a87f0000,
>>>>> client=0x7f885845efa0,
>>>>>
>>>>>
>>>>>
>>>>>
>>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zo
>>> ne2-ssd1-vmstor1-client-0-0-0,
>>>
>>>>
>>>>> granted at 2018-01-20 10:59:52
>>>>>
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>>>> start=0,
>>>>> len=0, pid
>>>>> = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>>>>
>>>>> connection-id=ovirt8z2.xxx.com [1]
>>>>>
>>>>>
>>>>>
>>>> <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zo
>>> ne2-ssd1-vmstor1-client-0-7-0,
>>>
>>>>
>>>>> granted at 2018-01-20 08:57:23
>>>>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
>>>>> start=0,
>>>>> len=0,
>>>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>>>> client=0x7f885845efa0,
>>>>>
>>>>>
>>>>>
>>>>>
>>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zo
>>> ne2-ssd1-vmstor1-client-0-0-0,
>>>
>>>>
>>>>> blocked at 2018-01-20 10:59:52
>>>>>
>>>>> I'd also like to add that volume had arbiter brick
>>>>> before
>>>>> crash
>>>>> happened. We decided to remove it because we thought
>>>>> that
>>>>> it was
>>>>> causing issues. However now I think that this was
>>>>> unnecessary. After
>>>>> the crash arbiter logs had lots of messages like this:
>>>>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>>>>> [server-rpc-fops.c:1640:server_setattr_cbk]
>>>>> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>>>>> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>>>>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
>>>>> not
>>>>> permitted)
>>>>> [Operation not permitted]
>>>>>
>>>>> Is there anyways to force self heal to stop? Any help
>>>>> would be very
>>>>> much appreciated :)
>>>>>
>>>>> Exposing .shard to a normal mount is opening a can of
>>>>> worms. You
>>>>> should probably look at mounting the volume with gfid
>>>>> aux-mount where
>>>>> you can access a file with
>>>>> <path-to-mount>/.gfid/<gfid-string>to clear
>>>>> locks on it.
>>>>>
>>>>> Mount command:  mount -t glusterfs -o aux-gfid-mount
>>>>> vm1:test
>>>>> /mnt/testvol
>>>>>
>>>>> A gfid string will have some hyphens like:
>>>>> 11118443-1894-4273-9340-4b212fa1c0e4
>>>>>
>>>>> That said. Next disconnect on the brick where you
>>>>> successfully
>>>>> did the
>>>>> clear-locks will crash the brick. There was a bug in 3.8.x
>>>>> series with
>>>>> clear-locks which was fixed in 3.9.0 with a feature. The
>>>>> self-heal
>>>>> deadlocks that you witnessed also is fixed in 3.10 version
>>>>> of the
>>>>> release.
>>>>>
>>>>> Thank you the answer. Could you please tell more about crash?
>>>>> What
>>>>> will actually happen or is there a bug report about it? Just
>>>>> want
>>>>> to make sure that we can do everything to secure data on
>>>>> bricks.
>>>>> We will look into upgrade but we have to make sure that new
>>>>> version works for us and of course get self healing working
>>>>> before
>>>>> doing anything :)
>>>>>
>>>>> Locks xlator/module maintains a list of locks that are granted to
>>>>> a client. Clear locks had an issue where it forgets to remove the
>>>>> lock from this list. So the connection list ends up pointing to
>>>>> data that is freed in that list after a clear lock. When a
>>>>> disconnect happens, all the locks that are granted to a client
>>>>> need to be unlocked. So the process starts traversing through this
>>>>> list and when it starts trying to access this freed data it leads
>>>>> to a crash. I found it while reviewing a feature patch sent by
>>>>> facebook folks to locks xlator (http://review.gluster.org/14816
>>>>> [2]) for 3.9.0 and they also fixed this bug as well as part of
>>>>>
>>>>> that feature patch.
>>>>>
>>>>> Br,
>>>>> Samuli
>>>>>
>>>>> 3.8.x is EOLed, so I recommend you to upgrade to a
>>>>> supported
>>>>> version
>>>>> soon.
>>>>>
>>>>> Best regards,
>>>>> Samuli Heinonen
>>>>>
>>>>> Samuli Heinonen
>>>>> 20 January 2018 at 21.57
>>>>>
>>>>> Hi all!
>>>>>
>>>>> One hypervisor on our virtualization environment
>>>>> crashed and now
>>>>> some of the VM images cannot be accessed. After
>>>>> investigation we
>>>>> found out that there was lots of images that still
>>>>> had
>>>>> active lock
>>>>> on crashed hypervisor. We were able to remove
>>>>> locks
>>>>> from "regular
>>>>> files", but it doesn't seem possible to remove
>>>>> locks
>>>>> from shards.
>>>>>
>>>>> We are running GlusterFS 3.8.15 on all nodes.
>>>>>
>>>>> Here is part of statedump that shows shard having
>>>>> active lock on
>>>>> crashed node:
>>>>>
>>>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>>>>
>>>>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>>>> mandatory=0
>>>>> inodelk-count=1
>>>>>
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>>>>
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>>>>
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>>>> start=0, len=0,
>>>>> pid = 3568, owner=14ce372c397f0000,
>>>>> client=0x7f3198388770,
>>>>> connection-id
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
>>> or1-client-1-7-0,
>>>
>>>>
>>>>> granted at 2018-01-20 08:57:24
>>>>>
>>>>> If we try to run clear-locks we get following
>>>>> error
>>>>> message:
>>>>> # gluster volume clear-locks zone2-ssd1-vmstor1
>>>>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>>>> kind
>>>>> all inode
>>>>> Volume clear-locks unsuccessful
>>>>> clear-locks getxattr command failed. Reason:
>>>>> Operation not
>>>>> permitted
>>>>>
>>>>> Gluster vol info if needed:
>>>>> Volume Name: zone2-ssd1-vmstor1
>>>>> Type: Replicate
>>>>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x 2 = 2
>>>>> Transport-type: rdma
>>>>> Bricks:
>>>>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>>>>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>>>>> Options Reconfigured:
>>>>> cluster.shd-wait-qlength: 10000
>>>>> cluster.shd-max-threads: 8
>>>>> cluster.locking-scheme: granular
>>>>> performance.low-prio-threads: 32
>>>>> cluster.data-self-heal-algorithm: full
>>>>> performance.client-io-threads: off
>>>>> storage.linux-aio: off
>>>>> performance.readdir-ahead: on
>>>>> client.event-threads: 16
>>>>> server.event-threads: 16
>>>>> performance.strict-write-ordering: off
>>>>> performance.quick-read: off
>>>>> performance.read-ahead: on
>>>>> performance.io-cache: off
>>>>> performance.stat-prefetch: off
>>>>> cluster.eager-lock: enable
>>>>> network.remote-dio: on
>>>>> cluster.quorum-type: none
>>>>> network.ping-timeout: 22
>>>>> performance.write-behind: off
>>>>> nfs.disable: on
>>>>> features.shard: on
>>>>> features.shard-block-size: 512MB
>>>>> storage.owner-uid: 36
>>>>> storage.owner-gid: 36
>>>>> performance.io-thread-count: 64
>>>>> performance.cache-size: 2048MB
>>>>> performance.write-behind-window-size: 256MB
>>>>> server.allow-insecure: on
>>>>> cluster.ensure-durability: off
>>>>> config.transport: rdma
>>>>> server.outstanding-rpc-limit: 512
>>>>> diagnostics.brick-log-level: INFO
>>>>>
>>>>> Any recommendations how to advance from here?
>>>>>
>>>>> Best regards,
>>>>> Samuli Heinonen
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> <mailto:Gluster-users at gluster.org>
>>>>>
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
>>>>> [1]
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> <mailto:Gluster-users at gluster.org>
>>>>>
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>>>
>>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users [3]> [1]
>>>>>
>>>>> --
>>>>>
>>>>> Pranith
>>>>>
>>>>> Links:
>>>>> ------
>>>>> [1]
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>> [3]>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>> Samuli Heinonen <mailto:samppah at neutraali.net>
>>>>> 21 January 2018 at 21.03
>>>>> Hi again,
>>>>>
>>>>> here is more information regarding issue described earlier
>>>>>
>>>>> It looks like self healing is stuck. According to "heal
>>>>> statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still
>>>>> going on (It's around Sun Jan 21 20:30 when writing this). However
>>>>> glustershd.log says that last heal was completed at "2018-01-20
>>>>> 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
>>>>> running now for over 16 hours without any information. In
>>>>> statedump I can see that storage nodes have locks on files and
>>>>> some of those are blocked. Ie. Here again it says that ovirt8z2 is
>>>>> having active lock even ovirt8z2 crashed after the lock was
>>>>> granted.:
>>>>>
>>>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>>>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>>>>> mandatory=0
>>>>> inodelk-count=3
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
>>>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>>>> client=0x7f885845efa0,
>>>>>
>>>>>
>>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zo
>>> ne2-ssd1-vmstor1-client-0-0-0,
>>>
>>>> granted at 2018-01-20 10:59:52
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
>>>>> pid = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>>>> connection-id=ovirt8z2.xxx.com
>>>>>
>>>>> [1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
>>>>
>>>>> granted at 2018-01-20 08:57:23
>>>>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
>>>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>>>> client=0x7f885845efa0,
>>>>>
>>>>>
>>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zo
>>> ne2-ssd1-vmstor1-client-0-0-0,
>>>
>>>> blocked at 2018-01-20 10:59:52
>>>>>
>>>>> I'd also like to add that volume had arbiter brick before crash
>>>>> happened. We decided to remove it because we thought that it was
>>>>> causing issues. However now I think that this was unnecessary.
>>>>> After the crash arbiter logs had lots of messages like this:
>>>>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>>>>> [server-rpc-fops.c:1640:server_setattr_cbk]
>>>>> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>>>>> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>>>>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
>>>>> permitted) [Operation not permitted]
>>>>>
>>>>> Is there anyways to force self heal to stop? Any help would be
>>>>> very much appreciated :)
>>>>>
>>>>> Best regards,
>>>>> Samuli Heinonen
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>>>
>>>>> Samuli Heinonen <mailto:samppah at neutraali.net>
>>>>>
>>>>> 20 January 2018 at 21.57
>>>>> Hi all!
>>>>>
>>>>> One hypervisor on our virtualization environment crashed and now
>>>>> some of the VM images cannot be accessed. After investigation we
>>>>> found out that there was lots of images that still had active lock
>>>>> on crashed hypervisor. We were able to remove locks from "regular
>>>>> files", but it doesn't seem possible to remove locks from shards.
>>>>>
>>>>> We are running GlusterFS 3.8.15 on all nodes.
>>>>>
>>>>> Here is part of statedump that shows shard having active lock on
>>>>> crashed node:
>>>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>>>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>>>> mandatory=0
>>>>> inodelk-count=1
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
>>>>> pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
>>>>> connection-id
>>>>>
>>>>>
>>>> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
>>> or1-client-1-7-0,
>>>
>>>> granted at 2018-01-20 08:57:24
>>>>>
>>>>> If we try to run clear-locks we get following error message:
>>>>> # gluster volume clear-locks zone2-ssd1-vmstor1
>>>>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
>>>>> Volume clear-locks unsuccessful
>>>>> clear-locks getxattr command failed. Reason: Operation not
>>>>> permitted
>>>>>
>>>>> Gluster vol info if needed:
>>>>> Volume Name: zone2-ssd1-vmstor1
>>>>> Type: Replicate
>>>>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x 2 = 2
>>>>> Transport-type: rdma
>>>>> Bricks:
>>>>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>>>>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>>>>> Options Reconfigured:
>>>>> cluster.shd-wait-qlength: 10000
>>>>> cluster.shd-max-threads: 8
>>>>> cluster.locking-scheme: granular
>>>>> performance.low-prio-threads: 32
>>>>> cluster.data-self-heal-algorithm: full
>>>>> performance.client-io-threads: off
>>>>> storage.linux-aio: off
>>>>> performance.readdir-ahead: on
>>>>> client.event-threads: 16
>>>>> server.event-threads: 16
>>>>> performance.strict-write-ordering: off
>>>>> performance.quick-read: off
>>>>> performance.read-ahead: on
>>>>> performance.io-cache: off
>>>>> performance.stat-prefetch: off
>>>>> cluster.eager-lock: enable
>>>>> network.remote-dio: on
>>>>> cluster.quorum-type: none
>>>>> network.ping-timeout: 22
>>>>> performance.write-behind: off
>>>>> nfs.disable: on
>>>>> features.shard: on
>>>>> features.shard-block-size: 512MB
>>>>> storage.owner-uid: 36
>>>>> storage.owner-gid: 36
>>>>> performance.io-thread-count: 64
>>>>> performance.cache-size: 2048MB
>>>>> performance.write-behind-window-size: 256MB
>>>>> server.allow-insecure: on
>>>>> cluster.ensure-durability: off
>>>>> config.transport: rdma
>>>>> server.outstanding-rpc-limit: 512
>>>>> diagnostics.brick-log-level: INFO
>>>>>
>>>>> Any recommendations how to advance from here?
>>>>>
>>>>> Best regards,
>>>>> Samuli Heinonen
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>>>>
>>>>
>>> --
>>>
>>> Pranith
>>>
>>>
>>> Links:
>>> ------
>>> [1] http://ovirt8z2.xxx.com
>>> [2] http://review.gluster.org/14816
>>> [3] http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>
>
> --
> Pranith
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180129/c55ca99d/attachment.html>