[Gluster-users] Stale locks on shards

Thu Jan 25 08:19:30 UTC 2018

Pranith Kumar Karampuri kirjoitti 25.01.2018 07:09:
> On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
> <samppah at neutraali.net> wrote:
> 
>> Hi!
>> 
>> Thank you very much for your help so far. Could you please tell an
>> example command how to use aux-gid-mount to remove locks? "gluster
>> vol clear-locks" seems to mount volume by itself.
> 
> You are correct, sorry, this was implemented around 7 years back and I
> forgot that bit about it :-(. Essentially it becomes a getxattr
> syscall on the file.
> Could you give me the clear-locks command you were trying to execute
> and I can probably convert it to the getfattr command?

I have been testing this in test environment and with command:
gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c 
kind all inode


> 
>> Best regards,
>> Samuli Heinonen
>> 
>>> Pranith Kumar Karampuri <mailto:pkarampu at redhat.com>
>>> 23 January 2018 at 10.30
>>> 
>>> On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
>>> <samppah at neutraali.net <mailto:samppah at neutraali.net>> wrote:
>>> 
>>> Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>>> 
>>> On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>>> 
>>> <samppah at neutraali.net <mailto:samppah at neutraali.net>>
>>> wrote:
>>> 
>>> Hi again,
>>> 
>>> here is more information regarding issue described
>>> earlier
>>> 
>>> It looks like self healing is stuck. According to
>>> "heal
>>> statistics"
>>> crawl began at Sat Jan 20 12:56:19 2018 and it's still
>>> going on
>>> (It's around Sun Jan 21 20:30 when writing this).
>>> However
>>> glustershd.log says that last heal was completed at
>>> "2018-01-20
>>> 11:00:13.090697" (which is 13:00 UTC+2). Also "heal
>>> info"
>>> has been
>>> running now for over 16 hours without any information.
>>> In
>>> statedump
>>> I can see that storage nodes have locks on files and
>>> some
>>> of those
>>> are blocked. Ie. Here again it says that ovirt8z2 is
>>> having active
>>> lock even ovirt8z2 crashed after the lock was
>>> granted.:
>>> 
>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>>> mandatory=0
>>> inodelk-count=3
>>> 
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>> start=0,
>>> len=0, pid
>>> = 18446744073709551610, owner=d0c6d857a87f0000,
>>> client=0x7f885845efa0,
>>> 
>>> 
>>> 
>> 
> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
>>> 
>>> granted at 2018-01-20 10:59:52
>>> 
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>> start=0,
>>> len=0, pid
>>> = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>> 
>>> connection-id=ovirt8z2.xxx.com [1]
>>> 
>>> 
>> 
> <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
>>> 
>>> granted at 2018-01-20 08:57:23
>>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
>>> start=0,
>>> len=0,
>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>> client=0x7f885845efa0,
>>> 
>>> 
>>> 
>> 
> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
>>> 
>>> blocked at 2018-01-20 10:59:52
>>> 
>>> I'd also like to add that volume had arbiter brick
>>> before
>>> crash
>>> happened. We decided to remove it because we thought
>>> that
>>> it was
>>> causing issues. However now I think that this was
>>> unnecessary. After
>>> the crash arbiter logs had lots of messages like this:
>>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>>> [server-rpc-fops.c:1640:server_setattr_cbk]
>>> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>>> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
>>> not
>>> permitted)
>>> [Operation not permitted]
>>> 
>>> Is there anyways to force self heal to stop? Any help
>>> would be very
>>> much appreciated :)
>>> 
>>> Exposing .shard to a normal mount is opening a can of
>>> worms. You
>>> should probably look at mounting the volume with gfid
>>> aux-mount where
>>> you can access a file with
>>> <path-to-mount>/.gfid/<gfid-string>to clear
>>> locks on it.
>>> 
>>> Mount command:  mount -t glusterfs -o aux-gfid-mount
>>> vm1:test
>>> /mnt/testvol
>>> 
>>> A gfid string will have some hyphens like:
>>> 11118443-1894-4273-9340-4b212fa1c0e4
>>> 
>>> That said. Next disconnect on the brick where you
>>> successfully
>>> did the
>>> clear-locks will crash the brick. There was a bug in 3.8.x
>>> series with
>>> clear-locks which was fixed in 3.9.0 with a feature. The
>>> self-heal
>>> deadlocks that you witnessed also is fixed in 3.10 version
>>> of the
>>> release.
>>> 
>>> Thank you the answer. Could you please tell more about crash?
>>> What
>>> will actually happen or is there a bug report about it? Just
>>> want
>>> to make sure that we can do everything to secure data on
>>> bricks.
>>> We will look into upgrade but we have to make sure that new
>>> version works for us and of course get self healing working
>>> before
>>> doing anything :)
>>> 
>>> Locks xlator/module maintains a list of locks that are granted to
>>> a client. Clear locks had an issue where it forgets to remove the
>>> lock from this list. So the connection list ends up pointing to
>>> data that is freed in that list after a clear lock. When a
>>> disconnect happens, all the locks that are granted to a client
>>> need to be unlocked. So the process starts traversing through this
>>> list and when it starts trying to access this freed data it leads
>>> to a crash. I found it while reviewing a feature patch sent by
>>> facebook folks to locks xlator (http://review.gluster.org/14816
>>> [2]) for 3.9.0 and they also fixed this bug as well as part of
>>> that feature patch.
>>> 
>>> Br,
>>> Samuli
>>> 
>>> 3.8.x is EOLed, so I recommend you to upgrade to a
>>> supported
>>> version
>>> soon.
>>> 
>>> Best regards,
>>> Samuli Heinonen
>>> 
>>> Samuli Heinonen
>>> 20 January 2018 at 21.57
>>> 
>>> Hi all!
>>> 
>>> One hypervisor on our virtualization environment
>>> crashed and now
>>> some of the VM images cannot be accessed. After
>>> investigation we
>>> found out that there was lots of images that still
>>> had
>>> active lock
>>> on crashed hypervisor. We were able to remove
>>> locks
>>> from "regular
>>> files", but it doesn't seem possible to remove
>>> locks
>>> from shards.
>>> 
>>> We are running GlusterFS 3.8.15 on all nodes.
>>> 
>>> Here is part of statedump that shows shard having
>>> active lock on
>>> crashed node:
>>> 
>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>> 
>>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>> mandatory=0
>>> inodelk-count=1
>>> 
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>> 
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>> 
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>> start=0, len=0,
>>> pid = 3568, owner=14ce372c397f0000,
>>> client=0x7f3198388770,
>>> connection-id
>>> 
>>> 
>>> 
>> 
> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
>>> 
>>> granted at 2018-01-20 08:57:24
>>> 
>>> If we try to run clear-locks we get following
>>> error
>>> message:
>>> # gluster volume clear-locks zone2-ssd1-vmstor1
>>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>> kind
>>> all inode
>>> Volume clear-locks unsuccessful
>>> clear-locks getxattr command failed. Reason:
>>> Operation not
>>> permitted
>>> 
>>> Gluster vol info if needed:
>>> Volume Name: zone2-ssd1-vmstor1
>>> Type: Replicate
>>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: rdma
>>> Bricks:
>>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>>> Options Reconfigured:
>>> cluster.shd-wait-qlength: 10000
>>> cluster.shd-max-threads: 8
>>> cluster.locking-scheme: granular
>>> performance.low-prio-threads: 32
>>> cluster.data-self-heal-algorithm: full
>>> performance.client-io-threads: off
>>> storage.linux-aio: off
>>> performance.readdir-ahead: on
>>> client.event-threads: 16
>>> server.event-threads: 16
>>> performance.strict-write-ordering: off
>>> performance.quick-read: off
>>> performance.read-ahead: on
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> cluster.eager-lock: enable
>>> network.remote-dio: on
>>> cluster.quorum-type: none
>>> network.ping-timeout: 22
>>> performance.write-behind: off
>>> nfs.disable: on
>>> features.shard: on
>>> features.shard-block-size: 512MB
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>> performance.io-thread-count: 64
>>> performance.cache-size: 2048MB
>>> performance.write-behind-window-size: 256MB
>>> server.allow-insecure: on
>>> cluster.ensure-durability: off
>>> config.transport: rdma
>>> server.outstanding-rpc-limit: 512
>>> diagnostics.brick-log-level: INFO
>>> 
>>> Any recommendations how to advance from here?
>>> 
>>> Best regards,
>>> Samuli Heinonen
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> <mailto:Gluster-users at gluster.org>
>>> 
>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>> <http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
>>> [1]
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> <mailto:Gluster-users at gluster.org>
>>> 
>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>> 
>>> <http://lists.gluster.org/mailman/listinfo/gluster-users [3]> [1]
>>> 
>>> --
>>> 
>>> Pranith
>>> 
>>> Links:
>>> ------
>>> [1]
>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>> <http://lists.gluster.org/mailman/listinfo/gluster-users
>>> [3]>
>>> 
>>> --
>>> Pranith
>>> Samuli Heinonen <mailto:samppah at neutraali.net>
>>> 21 January 2018 at 21.03
>>> Hi again,
>>> 
>>> here is more information regarding issue described earlier
>>> 
>>> It looks like self healing is stuck. According to "heal
>>> statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still
>>> going on (It's around Sun Jan 21 20:30 when writing this). However
>>> glustershd.log says that last heal was completed at "2018-01-20
>>> 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
>>> running now for over 16 hours without any information. In
>>> statedump I can see that storage nodes have locks on files and
>>> some of those are blocked. Ie. Here again it says that ovirt8z2 is
>>> having active lock even ovirt8z2 crashed after the lock was
>>> granted.:
>>> 
>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>>> mandatory=0
>>> inodelk-count=3
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>> client=0x7f885845efa0,
>>> 
>> 
> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
>>> granted at 2018-01-20 10:59:52
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
>>> pid = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>> connection-id=ovirt8z2.xxx.com
>>> 
>> [1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
>>> granted at 2018-01-20 08:57:23
>>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>> client=0x7f885845efa0,
>>> 
>> 
> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
>>> blocked at 2018-01-20 10:59:52
>>> 
>>> I'd also like to add that volume had arbiter brick before crash
>>> happened. We decided to remove it because we thought that it was
>>> causing issues. However now I think that this was unnecessary.
>>> After the crash arbiter logs had lots of messages like this:
>>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>>> [server-rpc-fops.c:1640:server_setattr_cbk]
>>> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>>> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
>>> permitted) [Operation not permitted]
>>> 
>>> Is there anyways to force self heal to stop? Any help would be
>>> very much appreciated :)
>>> 
>>> Best regards,
>>> Samuli Heinonen
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
>>> Samuli Heinonen <mailto:samppah at neutraali.net>
>>> 
>>> 20 January 2018 at 21.57
>>> Hi all!
>>> 
>>> One hypervisor on our virtualization environment crashed and now
>>> some of the VM images cannot be accessed. After investigation we
>>> found out that there was lots of images that still had active lock
>>> on crashed hypervisor. We were able to remove locks from "regular
>>> files", but it doesn't seem possible to remove locks from shards.
>>> 
>>> We are running GlusterFS 3.8.15 on all nodes.
>>> 
>>> Here is part of statedump that shows shard having active lock on
>>> crashed node:
>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>> mandatory=0
>>> inodelk-count=1
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
>>> pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
>>> connection-id
>>> 
>> 
> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
>>> granted at 2018-01-20 08:57:24
>>> 
>>> If we try to run clear-locks we get following error message:
>>> # gluster volume clear-locks zone2-ssd1-vmstor1
>>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
>>> Volume clear-locks unsuccessful
>>> clear-locks getxattr command failed. Reason: Operation not
>>> permitted
>>> 
>>> Gluster vol info if needed:
>>> Volume Name: zone2-ssd1-vmstor1
>>> Type: Replicate
>>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: rdma
>>> Bricks:
>>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>>> Options Reconfigured:
>>> cluster.shd-wait-qlength: 10000
>>> cluster.shd-max-threads: 8
>>> cluster.locking-scheme: granular
>>> performance.low-prio-threads: 32
>>> cluster.data-self-heal-algorithm: full
>>> performance.client-io-threads: off
>>> storage.linux-aio: off
>>> performance.readdir-ahead: on
>>> client.event-threads: 16
>>> server.event-threads: 16
>>> performance.strict-write-ordering: off
>>> performance.quick-read: off
>>> performance.read-ahead: on
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> cluster.eager-lock: enable
>>> network.remote-dio: on
>>> cluster.quorum-type: none
>>> network.ping-timeout: 22
>>> performance.write-behind: off
>>> nfs.disable: on
>>> features.shard: on
>>> features.shard-block-size: 512MB
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>> performance.io-thread-count: 64
>>> performance.cache-size: 2048MB
>>> performance.write-behind-window-size: 256MB
>>> server.allow-insecure: on
>>> cluster.ensure-durability: off
>>> config.transport: rdma
>>> server.outstanding-rpc-limit: 512
>>> diagnostics.brick-log-level: INFO
>>> 
>>> Any recommendations how to advance from here?
>>> 
>>> Best regards,
>>> Samuli Heinonen
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users [3]
> 
> --
> 
> Pranith
> 
> 
> Links:
> ------
> [1] http://ovirt8z2.xxx.com
> [2] http://review.gluster.org/14816
> [3] http://lists.gluster.org/mailman/listinfo/gluster-users