[Gluster-users] Stale locks on shards

Wed Jan 24 20:57:03 UTC 2018

Hi!

Thank you very much for your help so far. Could you please tell an 
example command how to use aux-gid-mount to remove locks? "gluster vol 
clear-locks" seems to mount volume by itself.

Best regards,
Samuli Heinonen

> Pranith Kumar Karampuri <mailto:pkarampu at redhat.com>
> 23 January 2018 at 10.30
>
>
> On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen 
> <samppah at neutraali.net <mailto:samppah at neutraali.net>> wrote:
>
>     Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>
>         On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>         <samppah at neutraali.net <mailto:samppah at neutraali.net>> wrote:
>
>             Hi again,
>
>             here is more information regarding issue described earlier
>
>             It looks like self healing is stuck. According to "heal
>             statistics"
>             crawl began at Sat Jan 20 12:56:19 2018 and it's still
>             going on
>             (It's around Sun Jan 21 20:30 when writing this). However
>             glustershd.log says that last heal was completed at
>             "2018-01-20
>             11:00:13.090697" (which is 13:00 UTC+2). Also "heal info"
>             has been
>             running now for over 16 hours without any information. In
>             statedump
>             I can see that storage nodes have locks on files and some
>             of those
>             are blocked. Ie. Here again it says that ovirt8z2 is
>             having active
>             lock even ovirt8z2 crashed after the lock was granted.:
>
>             [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>             path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>             mandatory=0
>             inodelk-count=3
>             lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>             inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>             len=0, pid
>             = 18446744073709551610, owner=d0c6d857a87f0000,
>             client=0x7f885845efa0,
>
>         connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
>
>             granted at 2018-01-20 10:59:52
>             lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>             lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>             inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>             len=0, pid
>             = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>
>         connection-id=ovirt8z2.xxx.com
>         <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
>
>             granted at 2018-01-20 08:57:23
>             inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0,
>             len=0,
>             pid = 18446744073709551610, owner=d0c6d857a87f0000,
>             client=0x7f885845efa0,
>
>         connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
>
>             blocked at 2018-01-20 10:59:52
>
>             I'd also like to add that volume had arbiter brick before
>             crash
>             happened. We decided to remove it because we thought that
>             it was
>             causing issues. However now I think that this was
>             unnecessary. After
>             the crash arbiter logs had lots of messages like this:
>             [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>             [server-rpc-fops.c:1640:server_setattr_cbk]
>             0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>             <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>             (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
>             permitted)
>             [Operation not permitted]
>
>             Is there anyways to force self heal to stop? Any help
>             would be very
>             much appreciated :)
>
>
>         Exposing .shard to a normal mount is opening a can of worms. You
>         should probably look at mounting the volume with gfid
>         aux-mount where
>         you can access a file with
>         <path-to-mount>/.gfid/<gfid-string>to clear
>         locks on it.
>
>         Mount command:  mount -t glusterfs -o aux-gfid-mount vm1:test
>         /mnt/testvol
>
>         A gfid string will have some hyphens like:
>         11118443-1894-4273-9340-4b212fa1c0e4
>
>         That said. Next disconnect on the brick where you successfully
>         did the
>         clear-locks will crash the brick. There was a bug in 3.8.x
>         series with
>         clear-locks which was fixed in 3.9.0 with a feature. The self-heal
>         deadlocks that you witnessed also is fixed in 3.10 version of the
>         release.
>
>
>
>     Thank you the answer. Could you please tell more about crash? What
>     will actually happen or is there a bug report about it? Just want
>     to make sure that we can do everything to secure data on bricks.
>     We will look into upgrade but we have to make sure that new
>     version works for us and of course get self healing working before
>     doing anything :)
>
>
> Locks xlator/module maintains a list of locks that are granted to a 
> client. Clear locks had an issue where it forgets to remove the lock 
> from this list. So the connection list ends up pointing to data that 
> is freed in that list after a clear lock. When a disconnect happens, 
> all the locks that are granted to a client need to be unlocked. So the 
> process starts traversing through this list and when it starts trying 
> to access this freed data it leads to a crash. I found it while 
> reviewing a feature patch sent by facebook folks to locks xlator 
> (http://review.gluster.org/14816) for 3.9.0 and they also fixed this 
> bug as well as part of that feature patch.
>
>
>     Br,
>     Samuli
>
>
>         3.8.x is EOLed, so I recommend you to upgrade to a supported
>         version
>         soon.
>
>             Best regards,
>             Samuli Heinonen
>
>                 Samuli Heinonen
>                 20 January 2018 at 21.57
>
>                 Hi all!
>
>                 One hypervisor on our virtualization environment
>                 crashed and now
>                 some of the VM images cannot be accessed. After
>                 investigation we
>                 found out that there was lots of images that still had
>                 active lock
>                 on crashed hypervisor. We were able to remove locks
>                 from "regular
>                 files", but it doesn't seem possible to remove locks
>                 from shards.
>
>                 We are running GlusterFS 3.8.15 on all nodes.
>
>                 Here is part of statedump that shows shard having
>                 active lock on
>                 crashed node:
>                 [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>                 path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>                 mandatory=0
>                 inodelk-count=1
>                 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>                 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>                 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>                 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>                 start=0, len=0,
>                 pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
>                 connection-id
>
>
>         ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
>
>                 granted at 2018-01-20 08:57:24
>
>                 If we try to run clear-locks we get following error
>                 message:
>                 # gluster volume clear-locks zone2-ssd1-vmstor1
>                 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
>                 all inode
>                 Volume clear-locks unsuccessful
>                 clear-locks getxattr command failed. Reason: Operation not
>                 permitted
>
>                 Gluster vol info if needed:
>                 Volume Name: zone2-ssd1-vmstor1
>                 Type: Replicate
>                 Volume ID: b6319968-690b-4060-8fff-b212d2295208
>                 Status: Started
>                 Snapshot Count: 0
>                 Number of Bricks: 1 x 2 = 2
>                 Transport-type: rdma
>                 Bricks:
>                 Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>                 Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>                 Options Reconfigured:
>                 cluster.shd-wait-qlength: 10000
>                 cluster.shd-max-threads: 8
>                 cluster.locking-scheme: granular
>                 performance.low-prio-threads: 32
>                 cluster.data-self-heal-algorithm: full
>                 performance.client-io-threads: off
>                 storage.linux-aio: off
>                 performance.readdir-ahead: on
>                 client.event-threads: 16
>                 server.event-threads: 16
>                 performance.strict-write-ordering: off
>                 performance.quick-read: off
>                 performance.read-ahead: on
>                 performance.io-cache: off
>                 performance.stat-prefetch: off
>                 cluster.eager-lock: enable
>                 network.remote-dio: on
>                 cluster.quorum-type: none
>                 network.ping-timeout: 22
>                 performance.write-behind: off
>                 nfs.disable: on
>                 features.shard: on
>                 features.shard-block-size: 512MB
>                 storage.owner-uid: 36
>                 storage.owner-gid: 36
>                 performance.io-thread-count: 64
>                 performance.cache-size: 2048MB
>                 performance.write-behind-window-size: 256MB
>                 server.allow-insecure: on
>                 cluster.ensure-durability: off
>                 config.transport: rdma
>                 server.outstanding-rpc-limit: 512
>                 diagnostics.brick-log-level: INFO
>
>                 Any recommendations how to advance from here?
>
>                 Best regards,
>                 Samuli Heinonen
>
>                 _______________________________________________
>                 Gluster-users mailing list
>                 Gluster-users at gluster.org
>                 <mailto:Gluster-users at gluster.org>
>                 http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
>                 [1]
>
>
>             _______________________________________________
>             Gluster-users mailing list
>             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>             http://lists.gluster.org/mailman/listinfo/gluster-users
>             <http://lists.gluster.org/mailman/listinfo/gluster-users> [1]
>
>
>         --
>
>         Pranith
>
>
>         Links:
>         ------
>         [1] http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
> -- 
> Pranith
> Samuli Heinonen <mailto:samppah at neutraali.net>
> 21 January 2018 at 21.03
> Hi again,
>
> here is more information regarding issue described earlier
>
> It looks like self healing is stuck. According to "heal statistics" 
> crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's 
> around Sun Jan 21 20:30 when writing this). However glustershd.log 
> says that last heal was completed at "2018-01-20 11:00:13.090697" 
> (which is 13:00 UTC+2). Also "heal info" has been running now for over 
> 16 hours without any information. In statedump I can see that storage 
> nodes have locks on files and some of those are blocked. Ie. Here 
> again it says that ovirt8z2 is having active lock even ovirt8z2 
> crashed after the lock was granted.:
>
> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
> mandatory=0
> inodelk-count=3
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
> 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, 
> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, 
> granted at 2018-01-20 10:59:52
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
> 3420, owner=d8b9372c397f0000, client=0x7f8858410be0, 
> connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0, 
> granted at 2018-01-20 08:57:23
> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
> = 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, 
> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, 
> blocked at 2018-01-20 10:59:52
>
> I'd also like to add that volume had arbiter brick before crash 
> happened. We decided to remove it because we thought that it was 
> causing issues. However now I think that this was unnecessary. After 
> the crash arbiter logs had lots of messages like this:
> [2018-01-20 10:19:36.515717] I [MSGID: 115072] 
> [server-rpc-fops.c:1640:server_setattr_cbk] 
> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR 
> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe> 
> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted) 
> [Operation not permitted]
>
> Is there anyways to force self heal to stop? Any help would be very 
> much appreciated :)
>
> Best regards,
> Samuli Heinonen
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> Samuli Heinonen <mailto:samppah at neutraali.net>
> 20 January 2018 at 21.57
> Hi all!
>
> One hypervisor on our virtualization environment crashed and now some 
> of the VM images cannot be accessed. After investigation we found out 
> that there was lots of images that still had active lock on crashed 
> hypervisor. We were able to remove locks from "regular files", but it 
> doesn't seem possible to remove locks from shards.
>
> We are running GlusterFS 3.8.15 on all nodes.
>
> Here is part of statedump that shows shard having active lock on 
> crashed node:
> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
> mandatory=0
> inodelk-count=1
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
> 3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id 
> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0, 
> granted at 2018-01-20 08:57:24
>
> If we try to run clear-locks we get following error message:
> # gluster volume clear-locks zone2-ssd1-vmstor1 
> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
> Volume clear-locks unsuccessful
> clear-locks getxattr command failed. Reason: Operation not permitted
>
> Gluster vol info if needed:
> Volume Name: zone2-ssd1-vmstor1
> Type: Replicate
> Volume ID: b6319968-690b-4060-8fff-b212d2295208
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: rdma
> Bricks:
> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
> Options Reconfigured:
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> performance.client-io-threads: off
> storage.linux-aio: off
> performance.readdir-ahead: on
> client.event-threads: 16
> server.event-threads: 16
> performance.strict-write-ordering: off
> performance.quick-read: off
> performance.read-ahead: on
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: on
> cluster.quorum-type: none
> network.ping-timeout: 22
> performance.write-behind: off
> nfs.disable: on
> features.shard: on
> features.shard-block-size: 512MB
> storage.owner-uid: 36
> storage.owner-gid: 36
> performance.io-thread-count: 64
> performance.cache-size: 2048MB
> performance.write-behind-window-size: 256MB
> server.allow-insecure: on
> cluster.ensure-durability: off
> config.transport: rdma
> server.outstanding-rpc-limit: 512
> diagnostics.brick-log-level: INFO
>
> Any recommendations how to advance from here?
>
> Best regards,
> Samuli Heinonen
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users