[Gluster-users] Stale locks on shards

Thu Jan 25 05:09:16 UTC 2018

On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen <samppah at neutraali.net>
wrote:

> Hi!
>
> Thank you very much for your help so far. Could you please tell an example
> command how to use aux-gid-mount to remove locks? "gluster vol clear-locks"
> seems to mount volume by itself.
>

You are correct, sorry, this was implemented around 7 years back and I
forgot that bit about it :-(. Essentially it becomes a getxattr syscall on
the file.
Could you give me the clear-locks command you were trying to execute and I
can probably convert it to the getfattr command?

>
> Best regards,
> Samuli Heinonen
>
> Pranith Kumar Karampuri <mailto:pkarampu at redhat.com>
>> 23 January 2018 at 10.30
>>
>>
>> On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen <samppah at neutraali.net
>> <mailto:samppah at neutraali.net>> wrote:
>>
>>     Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>>
>>         On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>>         <samppah at neutraali.net <mailto:samppah at neutraali.net>> wrote:
>>
>>             Hi again,
>>
>>             here is more information regarding issue described earlier
>>
>>             It looks like self healing is stuck. According to "heal
>>             statistics"
>>             crawl began at Sat Jan 20 12:56:19 2018 and it's still
>>             going on
>>             (It's around Sun Jan 21 20:30 when writing this). However
>>             glustershd.log says that last heal was completed at
>>             "2018-01-20
>>             11:00:13.090697" (which is 13:00 UTC+2). Also "heal info"
>>             has been
>>             running now for over 16 hours without any information. In
>>             statedump
>>             I can see that storage nodes have locks on files and some
>>             of those
>>             are blocked. Ie. Here again it says that ovirt8z2 is
>>             having active
>>             lock even ovirt8z2 crashed after the lock was granted.:
>>
>>             [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>             path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>>             mandatory=0
>>             inodelk-count=3
>>             lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-
>> heal
>>             inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>>             len=0, pid
>>             = 18446744073709551610, owner=d0c6d857a87f0000,
>>             client=0x7f885845efa0,
>>
>>         connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>             granted at 2018-01-20 10:59:52
>>             lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metad
>> ata
>>             lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>             inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>>             len=0, pid
>>             = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>
>>         connection-id=ovirt8z2.xxx.com
>>         <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-
>> zone2-ssd1-vmstor1-client-0-7-0,
>>
>>
>>             granted at 2018-01-20 08:57:23
>>             inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0,
>>             len=0,
>>             pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>             client=0x7f885845efa0,
>>
>>         connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>             blocked at 2018-01-20 10:59:52
>>
>>             I'd also like to add that volume had arbiter brick before
>>             crash
>>             happened. We decided to remove it because we thought that
>>             it was
>>             causing issues. However now I think that this was
>>             unnecessary. After
>>             the crash arbiter logs had lots of messages like this:
>>             [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>>             [server-rpc-fops.c:1640:server_setattr_cbk]
>>             0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>>             <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>>             (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
>>             permitted)
>>             [Operation not permitted]
>>
>>             Is there anyways to force self heal to stop? Any help
>>             would be very
>>             much appreciated :)
>>
>>
>>         Exposing .shard to a normal mount is opening a can of worms. You
>>         should probably look at mounting the volume with gfid
>>         aux-mount where
>>         you can access a file with
>>         <path-to-mount>/.gfid/<gfid-string>to clear
>>         locks on it.
>>
>>         Mount command:  mount -t glusterfs -o aux-gfid-mount vm1:test
>>         /mnt/testvol
>>
>>         A gfid string will have some hyphens like:
>>         11118443-1894-4273-9340-4b212fa1c0e4
>>
>>         That said. Next disconnect on the brick where you successfully
>>         did the
>>         clear-locks will crash the brick. There was a bug in 3.8.x
>>         series with
>>         clear-locks which was fixed in 3.9.0 with a feature. The self-heal
>>         deadlocks that you witnessed also is fixed in 3.10 version of the
>>         release.
>>
>>
>>
>>     Thank you the answer. Could you please tell more about crash? What
>>     will actually happen or is there a bug report about it? Just want
>>     to make sure that we can do everything to secure data on bricks.
>>     We will look into upgrade but we have to make sure that new
>>     version works for us and of course get self healing working before
>>     doing anything :)
>>
>>
>> Locks xlator/module maintains a list of locks that are granted to a
>> client. Clear locks had an issue where it forgets to remove the lock from
>> this list. So the connection list ends up pointing to data that is freed in
>> that list after a clear lock. When a disconnect happens, all the locks that
>> are granted to a client need to be unlocked. So the process starts
>> traversing through this list and when it starts trying to access this freed
>> data it leads to a crash. I found it while reviewing a feature patch sent
>> by facebook folks to locks xlator (http://review.gluster.org/14816) for
>> 3.9.0 and they also fixed this bug as well as part of that feature patch.
>>
>>
>>     Br,
>>     Samuli
>>
>>
>>         3.8.x is EOLed, so I recommend you to upgrade to a supported
>>         version
>>         soon.
>>
>>             Best regards,
>>             Samuli Heinonen
>>
>>                 Samuli Heinonen
>>                 20 January 2018 at 21.57
>>
>>                 Hi all!
>>
>>                 One hypervisor on our virtualization environment
>>                 crashed and now
>>                 some of the VM images cannot be accessed. After
>>                 investigation we
>>                 found out that there was lots of images that still had
>>                 active lock
>>                 on crashed hypervisor. We were able to remove locks
>>                 from "regular
>>                 files", but it doesn't seem possible to remove locks
>>                 from shards.
>>
>>                 We are running GlusterFS 3.8.15 on all nodes.
>>
>>                 Here is part of statedump that shows shard having
>>                 active lock on
>>                 crashed node:
>>                 [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>                 path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>                 mandatory=0
>>                 inodelk-count=1
>>                 lock-dump.domain.domain=zone2-
>> ssd1-vmstor1-replicate-0:metadata
>>                 lock-dump.domain.domain=zone2-
>> ssd1-vmstor1-replicate-0:self-heal
>>                 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>                 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>                 start=0, len=0,
>>                 pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
>>                 connection-id
>>
>>
>>         ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
>> or1-client-1-7-0,
>>
>>                 granted at 2018-01-20 08:57:24
>>
>>                 If we try to run clear-locks we get following error
>>                 message:
>>                 # gluster volume clear-locks zone2-ssd1-vmstor1
>>                 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
>>                 all inode
>>                 Volume clear-locks unsuccessful
>>                 clear-locks getxattr command failed. Reason: Operation not
>>                 permitted
>>
>>                 Gluster vol info if needed:
>>                 Volume Name: zone2-ssd1-vmstor1
>>                 Type: Replicate
>>                 Volume ID: b6319968-690b-4060-8fff-b212d2295208
>>                 Status: Started
>>                 Snapshot Count: 0
>>                 Number of Bricks: 1 x 2 = 2
>>                 Transport-type: rdma
>>                 Bricks:
>>                 Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>>                 Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>>                 Options Reconfigured:
>>                 cluster.shd-wait-qlength: 10000
>>                 cluster.shd-max-threads: 8
>>                 cluster.locking-scheme: granular
>>                 performance.low-prio-threads: 32
>>                 cluster.data-self-heal-algorithm: full
>>                 performance.client-io-threads: off
>>                 storage.linux-aio: off
>>                 performance.readdir-ahead: on
>>                 client.event-threads: 16
>>                 server.event-threads: 16
>>                 performance.strict-write-ordering: off
>>                 performance.quick-read: off
>>                 performance.read-ahead: on
>>                 performance.io-cache: off
>>                 performance.stat-prefetch: off
>>                 cluster.eager-lock: enable
>>                 network.remote-dio: on
>>                 cluster.quorum-type: none
>>                 network.ping-timeout: 22
>>                 performance.write-behind: off
>>                 nfs.disable: on
>>                 features.shard: on
>>                 features.shard-block-size: 512MB
>>                 storage.owner-uid: 36
>>                 storage.owner-gid: 36
>>                 performance.io-thread-count: 64
>>                 performance.cache-size: 2048MB
>>                 performance.write-behind-window-size: 256MB
>>                 server.allow-insecure: on
>>                 cluster.ensure-durability: off
>>                 config.transport: rdma
>>                 server.outstanding-rpc-limit: 512
>>                 diagnostics.brick-log-level: INFO
>>
>>                 Any recommendations how to advance from here?
>>
>>                 Best regards,
>>                 Samuli Heinonen
>>
>>                 _______________________________________________
>>                 Gluster-users mailing list
>>                 Gluster-users at gluster.org
>>                 <mailto:Gluster-users at gluster.org>
>>                 http://lists.gluster.org/mailman/listinfo/gluster-users <
>> http://lists.gluster.org/mailman/listinfo/gluster-users>
>>                 [1]
>>
>>
>>             _______________________________________________
>>             Gluster-users mailing list
>>             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>             http://lists.gluster.org/mailman/listinfo/gluster-users
>>             <http://lists.gluster.org/mailman/listinfo/gluster-users> [1]
>>
>>
>>         --
>>
>>         Pranith
>>
>>
>>         Links:
>>         ------
>>         [1] http://lists.gluster.org/mailman/listinfo/gluster-users
>>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>>
>> --
>> Pranith
>> Samuli Heinonen <mailto:samppah at neutraali.net>
>> 21 January 2018 at 21.03
>> Hi again,
>>
>> here is more information regarding issue described earlier
>>
>> It looks like self healing is stuck. According to "heal statistics" crawl
>> began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun
>> Jan 21 20:30 when writing this). However glustershd.log says that last heal
>> was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also
>> "heal info" has been running now for over 16 hours without any information.
>> In statedump I can see that storage nodes have locks on files and some of
>> those are blocked. Ie. Here again it says that ovirt8z2 is having active
>> lock even ovirt8z2 crashed after the lock was granted.:
>>
>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>> mandatory=0
>> inodelk-count=3
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
>> 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
>> 3420, owner=d8b9372c397f0000, client=0x7f8858410be0, connection-id=
>> ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:9468
>> 25-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23
>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
>> 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52
>>
>> I'd also like to add that volume had arbiter brick before crash happened.
>> We decided to remove it because we thought that it was causing issues.
>> However now I think that this was unnecessary. After the crash arbiter logs
>> had lots of messages like this:
>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>> [server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server:
>> 37374187: SETATTR <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)
>> [Operation not permitted]
>>
>> Is there anyways to force self heal to stop? Any help would be very much
>> appreciated :)
>>
>> Best regards,
>> Samuli Heinonen
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> Samuli Heinonen <mailto:samppah at neutraali.net>
>>
>> 20 January 2018 at 21.57
>> Hi all!
>>
>> One hypervisor on our virtualization environment crashed and now some of
>> the VM images cannot be accessed. After investigation we found out that
>> there was lots of images that still had active lock on crashed hypervisor.
>> We were able to remove locks from "regular files", but it doesn't seem
>> possible to remove locks from shards.
>>
>> We are running GlusterFS 3.8.15 on all nodes.
>>
>> Here is part of statedump that shows shard having active lock on crashed
>> node:
>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>> mandatory=0
>> inodelk-count=1
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
>> 3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
>> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
>> granted at 2018-01-20 08:57:24
>>
>> If we try to run clear-locks we get following error message:
>> # gluster volume clear-locks zone2-ssd1-vmstor1
>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
>> Volume clear-locks unsuccessful
>> clear-locks getxattr command failed. Reason: Operation not permitted
>>
>> Gluster vol info if needed:
>> Volume Name: zone2-ssd1-vmstor1
>> Type: Replicate
>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: rdma
>> Bricks:
>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>> Options Reconfigured:
>> cluster.shd-wait-qlength: 10000
>> cluster.shd-max-threads: 8
>> cluster.locking-scheme: granular
>> performance.low-prio-threads: 32
>> cluster.data-self-heal-algorithm: full
>> performance.client-io-threads: off
>> storage.linux-aio: off
>> performance.readdir-ahead: on
>> client.event-threads: 16
>> server.event-threads: 16
>> performance.strict-write-ordering: off
>> performance.quick-read: off
>> performance.read-ahead: on
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> cluster.eager-lock: enable
>> network.remote-dio: on
>> cluster.quorum-type: none
>> network.ping-timeout: 22
>> performance.write-behind: off
>> nfs.disable: on
>> features.shard: on
>> features.shard-block-size: 512MB
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> performance.io-thread-count: 64
>> performance.cache-size: 2048MB
>> performance.write-behind-window-size: 256MB
>> server.allow-insecure: on
>> cluster.ensure-durability: off
>> config.transport: rdma
>> server.outstanding-rpc-limit: 512
>> diagnostics.brick-log-level: INFO
>>
>> Any recommendations how to advance from here?
>>
>> Best regards,
>> Samuli Heinonen
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180125/f5481463/attachment.html>