[Gluster-users] Stale locks on shards
Pranith Kumar Karampuri
pkarampu at redhat.com
Thu Jan 25 05:09:16 UTC 2018
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen <samppah at neutraali.net>
wrote:
> Hi!
>
> Thank you very much for your help so far. Could you please tell an example
> command how to use aux-gid-mount to remove locks? "gluster vol clear-locks"
> seems to mount volume by itself.
>
You are correct, sorry, this was implemented around 7 years back and I
forgot that bit about it :-(. Essentially it becomes a getxattr syscall on
the file.
Could you give me the clear-locks command you were trying to execute and I
can probably convert it to the getfattr command?
>
> Best regards,
> Samuli Heinonen
>
> Pranith Kumar Karampuri <mailto:pkarampu at redhat.com>
>> 23 January 2018 at 10.30
>>
>>
>> On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen <samppah at neutraali.net
>> <mailto:samppah at neutraali.net>> wrote:
>>
>> Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>>
>> On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>> <samppah at neutraali.net <mailto:samppah at neutraali.net>> wrote:
>>
>> Hi again,
>>
>> here is more information regarding issue described earlier
>>
>> It looks like self healing is stuck. According to "heal
>> statistics"
>> crawl began at Sat Jan 20 12:56:19 2018 and it's still
>> going on
>> (It's around Sun Jan 21 20:30 when writing this). However
>> glustershd.log says that last heal was completed at
>> "2018-01-20
>> 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info"
>> has been
>> running now for over 16 hours without any information. In
>> statedump
>> I can see that storage nodes have locks on files and some
>> of those
>> are blocked. Ie. Here again it says that ovirt8z2 is
>> having active
>> lock even ovirt8z2 crashed after the lock was granted.:
>>
>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>> mandatory=0
>> inodelk-count=3
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-
>> heal
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>> len=0, pid
>> = 18446744073709551610, owner=d0c6d857a87f0000,
>> client=0x7f885845efa0,
>>
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>> granted at 2018-01-20 10:59:52
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metad
>> ata
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>> len=0, pid
>> = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>
>> connection-id=ovirt8z2.xxx.com
>> <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-
>> zone2-ssd1-vmstor1-client-0-7-0,
>>
>>
>> granted at 2018-01-20 08:57:23
>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0,
>> len=0,
>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>> client=0x7f885845efa0,
>>
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>> blocked at 2018-01-20 10:59:52
>>
>> I'd also like to add that volume had arbiter brick before
>> crash
>> happened. We decided to remove it because we thought that
>> it was
>> causing issues. However now I think that this was
>> unnecessary. After
>> the crash arbiter logs had lots of messages like this:
>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>> [server-rpc-fops.c:1640:server_setattr_cbk]
>> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
>> permitted)
>> [Operation not permitted]
>>
>> Is there anyways to force self heal to stop? Any help
>> would be very
>> much appreciated :)
>>
>>
>> Exposing .shard to a normal mount is opening a can of worms. You
>> should probably look at mounting the volume with gfid
>> aux-mount where
>> you can access a file with
>> <path-to-mount>/.gfid/<gfid-string>to clear
>> locks on it.
>>
>> Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test
>> /mnt/testvol
>>
>> A gfid string will have some hyphens like:
>> 11118443-1894-4273-9340-4b212fa1c0e4
>>
>> That said. Next disconnect on the brick where you successfully
>> did the
>> clear-locks will crash the brick. There was a bug in 3.8.x
>> series with
>> clear-locks which was fixed in 3.9.0 with a feature. The self-heal
>> deadlocks that you witnessed also is fixed in 3.10 version of the
>> release.
>>
>>
>>
>> Thank you the answer. Could you please tell more about crash? What
>> will actually happen or is there a bug report about it? Just want
>> to make sure that we can do everything to secure data on bricks.
>> We will look into upgrade but we have to make sure that new
>> version works for us and of course get self healing working before
>> doing anything :)
>>
>>
>> Locks xlator/module maintains a list of locks that are granted to a
>> client. Clear locks had an issue where it forgets to remove the lock from
>> this list. So the connection list ends up pointing to data that is freed in
>> that list after a clear lock. When a disconnect happens, all the locks that
>> are granted to a client need to be unlocked. So the process starts
>> traversing through this list and when it starts trying to access this freed
>> data it leads to a crash. I found it while reviewing a feature patch sent
>> by facebook folks to locks xlator (http://review.gluster.org/14816) for
>> 3.9.0 and they also fixed this bug as well as part of that feature patch.
>>
>>
>> Br,
>> Samuli
>>
>>
>> 3.8.x is EOLed, so I recommend you to upgrade to a supported
>> version
>> soon.
>>
>> Best regards,
>> Samuli Heinonen
>>
>> Samuli Heinonen
>> 20 January 2018 at 21.57
>>
>> Hi all!
>>
>> One hypervisor on our virtualization environment
>> crashed and now
>> some of the VM images cannot be accessed. After
>> investigation we
>> found out that there was lots of images that still had
>> active lock
>> on crashed hypervisor. We were able to remove locks
>> from "regular
>> files", but it doesn't seem possible to remove locks
>> from shards.
>>
>> We are running GlusterFS 3.8.15 on all nodes.
>>
>> Here is part of statedump that shows shard having
>> active lock on
>> crashed node:
>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>> mandatory=0
>> inodelk-count=1
>> lock-dump.domain.domain=zone2-
>> ssd1-vmstor1-replicate-0:metadata
>> lock-dump.domain.domain=zone2-
>> ssd1-vmstor1-replicate-0:self-heal
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>> start=0, len=0,
>> pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
>> connection-id
>>
>>
>> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
>> or1-client-1-7-0,
>>
>> granted at 2018-01-20 08:57:24
>>
>> If we try to run clear-locks we get following error
>> message:
>> # gluster volume clear-locks zone2-ssd1-vmstor1
>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
>> all inode
>> Volume clear-locks unsuccessful
>> clear-locks getxattr command failed. Reason: Operation not
>> permitted
>>
>> Gluster vol info if needed:
>> Volume Name: zone2-ssd1-vmstor1
>> Type: Replicate
>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: rdma
>> Bricks:
>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>> Options Reconfigured:
>> cluster.shd-wait-qlength: 10000
>> cluster.shd-max-threads: 8
>> cluster.locking-scheme: granular
>> performance.low-prio-threads: 32
>> cluster.data-self-heal-algorithm: full
>> performance.client-io-threads: off
>> storage.linux-aio: off
>> performance.readdir-ahead: on
>> client.event-threads: 16
>> server.event-threads: 16
>> performance.strict-write-ordering: off
>> performance.quick-read: off
>> performance.read-ahead: on
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> cluster.eager-lock: enable
>> network.remote-dio: on
>> cluster.quorum-type: none
>> network.ping-timeout: 22
>> performance.write-behind: off
>> nfs.disable: on
>> features.shard: on
>> features.shard-block-size: 512MB
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> performance.io-thread-count: 64
>> performance.cache-size: 2048MB
>> performance.write-behind-window-size: 256MB
>> server.allow-insecure: on
>> cluster.ensure-durability: off
>> config.transport: rdma
>> server.outstanding-rpc-limit: 512
>> diagnostics.brick-log-level: INFO
>>
>> Any recommendations how to advance from here?
>>
>> Best regards,
>> Samuli Heinonen
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users <
>> http://lists.gluster.org/mailman/listinfo/gluster-users>
>> [1]
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users> [1]
>>
>>
>> --
>>
>> Pranith
>>
>>
>> Links:
>> ------
>> [1] http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>>
>> --
>> Pranith
>> Samuli Heinonen <mailto:samppah at neutraali.net>
>> 21 January 2018 at 21.03
>> Hi again,
>>
>> here is more information regarding issue described earlier
>>
>> It looks like self healing is stuck. According to "heal statistics" crawl
>> began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun
>> Jan 21 20:30 when writing this). However glustershd.log says that last heal
>> was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also
>> "heal info" has been running now for over 16 hours without any information.
>> In statedump I can see that storage nodes have locks on files and some of
>> those are blocked. Ie. Here again it says that ovirt8z2 is having active
>> lock even ovirt8z2 crashed after the lock was granted.:
>>
>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>> mandatory=0
>> inodelk-count=3
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
>> 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
>> 3420, owner=d8b9372c397f0000, client=0x7f8858410be0, connection-id=
>> ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:9468
>> 25-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23
>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
>> 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52
>>
>> I'd also like to add that volume had arbiter brick before crash happened.
>> We decided to remove it because we thought that it was causing issues.
>> However now I think that this was unnecessary. After the crash arbiter logs
>> had lots of messages like this:
>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>> [server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server:
>> 37374187: SETATTR <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)
>> [Operation not permitted]
>>
>> Is there anyways to force self heal to stop? Any help would be very much
>> appreciated :)
>>
>> Best regards,
>> Samuli Heinonen
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> Samuli Heinonen <mailto:samppah at neutraali.net>
>>
>> 20 January 2018 at 21.57
>> Hi all!
>>
>> One hypervisor on our virtualization environment crashed and now some of
>> the VM images cannot be accessed. After investigation we found out that
>> there was lots of images that still had active lock on crashed hypervisor.
>> We were able to remove locks from "regular files", but it doesn't seem
>> possible to remove locks from shards.
>>
>> We are running GlusterFS 3.8.15 on all nodes.
>>
>> Here is part of statedump that shows shard having active lock on crashed
>> node:
>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>> mandatory=0
>> inodelk-count=1
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
>> 3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
>> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
>> granted at 2018-01-20 08:57:24
>>
>> If we try to run clear-locks we get following error message:
>> # gluster volume clear-locks zone2-ssd1-vmstor1
>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
>> Volume clear-locks unsuccessful
>> clear-locks getxattr command failed. Reason: Operation not permitted
>>
>> Gluster vol info if needed:
>> Volume Name: zone2-ssd1-vmstor1
>> Type: Replicate
>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: rdma
>> Bricks:
>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>> Options Reconfigured:
>> cluster.shd-wait-qlength: 10000
>> cluster.shd-max-threads: 8
>> cluster.locking-scheme: granular
>> performance.low-prio-threads: 32
>> cluster.data-self-heal-algorithm: full
>> performance.client-io-threads: off
>> storage.linux-aio: off
>> performance.readdir-ahead: on
>> client.event-threads: 16
>> server.event-threads: 16
>> performance.strict-write-ordering: off
>> performance.quick-read: off
>> performance.read-ahead: on
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> cluster.eager-lock: enable
>> network.remote-dio: on
>> cluster.quorum-type: none
>> network.ping-timeout: 22
>> performance.write-behind: off
>> nfs.disable: on
>> features.shard: on
>> features.shard-block-size: 512MB
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> performance.io-thread-count: 64
>> performance.cache-size: 2048MB
>> performance.write-behind-window-size: 256MB
>> server.allow-insecure: on
>> cluster.ensure-durability: off
>> config.transport: rdma
>> server.outstanding-rpc-limit: 512
>> diagnostics.brick-log-level: INFO
>>
>> Any recommendations how to advance from here?
>>
>> Best regards,
>> Samuli Heinonen
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
--
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180125/f5481463/attachment.html>
More information about the Gluster-users
mailing list