<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 29, 2018 at 1:26 PM, Samuli Heinonen <span dir="ltr">&lt;<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">Pranith Kumar Karampuri kirjoitti 29.01.2018 07:32:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

On 29 Jan 2018 10:50 am, &quot;Samuli Heinonen&quot; &lt;<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;<br>

wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Hi!<br>

<br>

Yes, thank you for asking. I found out this line in the production<br>

environment:<br>

<br>

</blockquote>

lgetxattr(&quot;/tmp/zone2-ssd1-vms<wbr>tor1.s6jvPu//.shard/f349ffbd-<wbr>a423-4fb2-b83c-2d1d5e78e1fb.<wbr>32&quot;,<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

&quot;glusterfs.clrlk.tinode.kblock<wbr>ed&quot;, 0x7f2d7c4379f0, 4096) = -1 EPERM<br>

(Operation not permitted)<br>

</blockquote>

<br>

I was expecting .kall instead of .blocked,<br>

did you change the cli to kind blocked?<br>

<br>

</blockquote>

<br></span>

Yes, I was testing this with different commands. Basicly it seems that name of the attribute is glusterfs.clrlk.t{posix,inode,<wbr>entry}.k{all,blocked,granted}, am I correct? </blockquote><div><br></div><div>That is correct <br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Is it necessary to set any value or just reguest the attribute with getfattr?<br></blockquote><div><br></div><div>Nope. No I/O is going on the file right?  Just request the attribute with getfattr in that case. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

And this one in test environment (with posix locks):<br>

lgetxattr(&quot;/tmp/g1.gHj4Bw//fil<wbr>e38&quot;,<br>

&quot;glusterfs.clrlk.tposix.kblock<wbr>ed&quot;, &quot;box1:/gluster/1/export/: posix<br>

blocked locks=1 granted locks=0&quot;, 4096) = 77<br>

<br>

In test environment I tried running following command which seemed<br>

to release gluster locks:<br>

<br>

getfattr -n glusterfs.clrlk.tposix.kblocke<wbr>d file38<br>

<br>

So I think it would go like this in production environment with<br>

locks on shards (using aux-gfid-mount mount option):<br>

getfattr -n glusterfs.clrlk.tinode.kall<br>

.shard/f349ffbd-a423-4fb2-b83c<wbr>-2d1d5e78e1fb.32<br>

<br>

I haven&#39;t been able to try this out in production environment yet.<br>

<br>

Is there perhaps something else to notice?<br>

<br>

Would you be able to tell more about bricks crashing after releasing<br>

locks? Under what circumstances that does happen? Is it only process<br>

exporting the brick crashes or is there a possibility of data<br>

corruption?<br>

</blockquote>

<br>

No data corruption. Brick process where you did clear-locks may crash.<br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Best regards,<br>

Samuli Heinonen<br>

<br>

Pranith Kumar Karampuri wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Hi,<br>

Did you find the command from strace?<br>

<br>

On 25 Jan 2018 1:52 pm, &quot;Pranith Kumar Karampuri&quot;<br>

&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a><br>

<br>

&lt;mailto:<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;&gt; wrote:<br>

<br>

On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen<br>

<br>

&lt;<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a> &lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;<wbr>&gt; wrote:<br>

<br>

Pranith Kumar Karampuri kirjoitti 25.01.2018 07:09:<br>

<br>

On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen<br>

<br>

&lt;<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a><br>

&lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;<wbr>&gt; wrote:<br>

<br>

Hi!<br>

<br>

Thank you very much for your help so far. Could<br>

you<br>

please tell an<br>

example command how to use aux-gid-mount to remove<br>

locks? &quot;gluster<br>

vol clear-locks&quot; seems to mount volume by itself.<br>

<br>

You are correct, sorry, this was implemented around 7<br>

years<br>

back and I<br>

forgot that bit about it :-(. Essentially it becomes a<br>

getxattr<br>

syscall on the file.<br>

Could you give me the clear-locks command you were<br>

trying to<br>

execute<br>

and I can probably convert it to the getfattr command?<br>

<br>

I have been testing this in test environment and with<br>

command:<br>

gluster vol clear-locks g1<br>

/.gfid/14341ccb-df7b-4f92-90d5<wbr>-7814431c5a1c kind all inode<br>

<br>

Could you do strace of glusterd when this happens? It will<br>

have a<br>

getxattr with &quot;glusterfs.clrlk&quot; in the key. You need to<br>

execute that<br>

on the gfid-aux-mount<br>

<br>

Best regards,<br>

Samuli Heinonen<br>

<br>

Pranith Kumar Karampuri<br>

&lt;mailto:<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a><br>

&lt;mailto:<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;&gt;<br>

23 January 2018 at 10.30<br>

<br>

On Tue, Jan 23, 2018 at 1:38 PM, Samuli<br>

Heinonen<br>

&lt;<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a><br>

&lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;<br>

&lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a><br>

<br>

&lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;<wbr>&gt;&gt; wrote:<br>

<br>

Pranith Kumar Karampuri kirjoitti 23.01.2018<br>

09:34:<br>

<br>

On Mon, Jan 22, 2018 at 12:33 AM, Samuli<br>

Heinonen<br>

<br>

&lt;<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a><br>

&lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;<br>

&lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a><br>

<br>

&lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;<wbr>&gt;&gt;<br>

wrote:<br>

<br>

Hi again,<br>

<br>

here is more information regarding issue<br>

described<br>

earlier<br>

<br>

It looks like self healing is stuck. According<br>

to<br>

&quot;heal<br>

statistics&quot;<br>

crawl began at Sat Jan 20 12:56:19 2018 and<br>

it&#39;s still<br>

going on<br>

(It&#39;s around Sun Jan 21 20:30 when writing<br>

this).<br>

However<br>

glustershd.log says that last heal was<br>

completed at<br>

&quot;2018-01-20<br>

11:00:13.090697&quot; (which is 13:00 UTC+2). Also<br>

&quot;heal<br>

info&quot;<br>

has been<br>

running now for over 16 hours without any<br>

information.<br>

In<br>

statedump<br>

I can see that storage nodes have locks on<br>

files and<br>

some<br>

of those<br>

are blocked. Ie. Here again it says that<br>

ovirt8z2 is<br>

having active<br>

lock even ovirt8z2 crashed after the lock was<br>

granted.:<br>

<br>

<br>

[xlator.features.locks.zone2-s<wbr>sd1-vmstor1-locks.inode]<br>

<br>

path=/.shard/3d55f8cc-cda9-489<wbr>a-b0a3-fd0f43d67876.27<br>

mandatory=0<br>

inodelk-count=3<br>

<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0:self-<wbr>heal<br>

inodelk.inodelk[0](ACTIVE)=typ<wbr>e=WRITE,<br>

whence=0,<br>

start=0,<br>

len=0, pid<br>

= 18446744073709551610,<br>

owner=d0c6d857a87f0000,<br>

client=0x7f885845efa0,<br>

<br>

<br>

<br>

</blockquote>

<br>

</blockquote>

connection-id=sto2z2.xxx-10975<wbr>-2018/01/20-10:56:14:649541-<wbr>zone2-ssd1-vmstor1-client-0-0-<wbr>0,<br>

</div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">

<br>

granted at 2018-01-20 10:59:52<br>

<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0:metad<wbr>ata<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0<br>

inodelk.inodelk[0](ACTIVE)=typ<wbr>e=WRITE,<br>

whence=0,<br>

start=0,<br>

len=0, pid<br>

= 3420, owner=d8b9372c397f0000,<br>

client=0x7f8858410be0,<br>

<br></div></div>

connection-id=<a href="http://ovirt8z2.xxx.com" rel="noreferrer" target="_blank">ovirt8z2.xxx.com</a> [1]<br>

&lt;<a href="http://ovirt8z2.xxx.com" rel="noreferrer" target="_blank">http://ovirt8z2.xxx.com</a>&gt; [1]<br>

<br>

<br>

<br>

</blockquote>

<br>

</blockquote><div><div class="gmail-h5">

&lt;<a href="http://ovirt8z2.xxx.com" rel="noreferrer" target="_blank">http://ovirt8z2.xxx.com</a>&gt;-5652<wbr>-2017/12/27-09:49:02:946825-<wbr>zone2-ssd1-vmstor1-client-0-7-<wbr>0,<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

granted at 2018-01-20 08:57:23<br>

inodelk.inodelk[1](BLOCKED)=ty<wbr>pe=WRITE,<br>

whence=0,<br>

start=0,<br>

len=0,<br>

pid = 18446744073709551610,<br>

owner=d0c6d857a87f0000,<br>

client=0x7f885845efa0,<br>

<br>

<br>

<br>

</blockquote>

<br>

</blockquote>

connection-id=sto2z2.xxx-10975<wbr>-2018/01/20-10:56:14:649541-<wbr>zone2-ssd1-vmstor1-client-0-0-<wbr>0,<br>

</div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">

<br>

blocked at 2018-01-20 10:59:52<br>

<br>

I&#39;d also like to add that volume had arbiter<br>

brick<br>

before<br>

crash<br>

happened. We decided to remove it because we<br>

thought<br>

that<br>

it was<br>

causing issues. However now I think that this<br>

was<br>

unnecessary. After<br>

the crash arbiter logs had lots of messages<br>

like this:<br>

[2018-01-20 10:19:36.515717] I [MSGID: 115072]<br>

[server-rpc-fops.c:1640:server<wbr>_setattr_cbk]<br>

0-zone2-ssd1-vmstor1-server: 37374187: SETATTR<br>

&lt;gfid:a52055bd-e2e9-42dd-92a3-<wbr>e96b693bcafe&gt;<br>

(a52055bd-e2e9-42dd-92a3-e96b6<wbr>93bcafe) ==&gt;<br>

(Operation<br>

not<br>

permitted)<br>

[Operation not permitted]<br>

<br>

Is there anyways to force self heal to stop?<br>

Any help<br>

would be very<br>

much appreciated :)<br>

<br>

Exposing .shard to a normal mount is opening a<br>

can of<br>

worms. You<br>

should probably look at mounting the volume<br>

with gfid<br>

aux-mount where<br>

you can access a file with<br>

&lt;path-to-mount&gt;/.gfid/&lt;gfid-st<wbr>ring&gt;to clear<br>

locks on it.<br>

<br>

Mount command:  mount -t glusterfs -o<br>

aux-gfid-mount<br>

vm1:test<br>

/mnt/testvol<br>

<br>

A gfid string will have some hyphens like:<br>

11118443-1894-4273-9340-4b212f<wbr>a1c0e4<br>

<br>

That said. Next disconnect on the brick where<br>

you<br>

successfully<br>

did the<br>

clear-locks will crash the brick. There was a<br>

bug in<br>

3.8.x<br>

series with<br>

clear-locks which was fixed in 3.9.0 with a<br>

feature. The<br>

self-heal<br>

deadlocks that you witnessed also is fixed in<br>

3.10<br>

version<br>

of the<br>

release.<br>

<br>

Thank you the answer. Could you please tell<br>

more<br>

about crash?<br>

What<br>

will actually happen or is there a bug report<br>

about<br>

it? Just<br>

want<br>

to make sure that we can do everything to<br>

secure data on<br>

bricks.<br>

We will look into upgrade but we have to make<br>

sure<br>

that new<br>

version works for us and of course get self<br>

healing<br>

working<br>

before<br>

doing anything :)<br>

<br>

Locks xlator/module maintains a list of locks<br>

that<br>

are granted to<br>

a client. Clear locks had an issue where it<br>

forgets<br>

to remove the<br>

lock from this list. So the connection list<br>

ends up<br>

pointing to<br>

data that is freed in that list after a clear<br>

lock.<br>

When a<br>

disconnect happens, all the locks that are<br>

granted<br>

to a client<br>

need to be unlocked. So the process starts<br>

traversing through this<br>

list and when it starts trying to access this<br>

freed<br>

data it leads<br>

to a crash. I found it while reviewing a<br>

feature<br>

patch sent by<br>

facebook folks to locks xlator<br></div></div>

(<a href="http://review.gluster.org/14816" rel="noreferrer" target="_blank">http://review.gluster.org/148<wbr>16</a> [2]<br>

&lt;<a href="http://review.gluster.org/14816" rel="noreferrer" target="_blank">http://review.gluster.org/148<wbr>16</a> [2]&gt;<div><div class="gmail-h5"><br>

[2]) for 3.9.0 and they also fixed this bug as<br>

well<br>

as part of<br>

<br>

that feature patch.<br>

<br>

Br,<br>

Samuli<br>

<br>

3.8.x is EOLed, so I recommend you to upgrade<br>

to a<br>

supported<br>

version<br>

soon.<br>

<br>

Best regards,<br>

Samuli Heinonen<br>

<br>

Samuli Heinonen<br>

20 January 2018 at 21.57<br>

<br>

Hi all!<br>

<br>

One hypervisor on our virtualization<br>

environment<br>

crashed and now<br>

some of the VM images cannot be accessed.<br>

After<br>

investigation we<br>

found out that there was lots of images that<br>

still<br>

had<br>

active lock<br>

on crashed hypervisor. We were able to remove<br>

locks<br>

from &quot;regular<br>

files&quot;, but it doesn&#39;t seem possible to remove<br>

locks<br>

from shards.<br>

<br>

We are running GlusterFS 3.8.15 on all nodes.<br>

<br>

Here is part of statedump that shows shard<br>

having<br>

active lock on<br>

crashed node:<br>

<br>

<br>

[xlator.features.locks.zone2-s<wbr>sd1-vmstor1-locks.inode]<br>

<br>

<br>

path=/.shard/75353c17-d6b8-485<wbr>d-9baf-fd6c700e39a1.21<br>

mandatory=0<br>

inodelk-count=1<br>

<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0:metad<wbr>ata<br>

<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0:self-<wbr>heal<br>

<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0<br>

inodelk.inodelk[0](ACTIVE)=typ<wbr>e=WRITE,<br>

whence=0,<br>

start=0, len=0,<br>

pid = 3568, owner=14ce372c397f0000,<br>

client=0x7f3198388770,<br>

connection-id<br>

<br>

<br>

<br>

</div></div></blockquote>

<br>

</blockquote>

ovirt8z2.xxx-5652-2017/12/27-0<wbr>9:49:02:946825-zone2-ssd1-vmst<wbr>or1-client-1-7-0,<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">

<br>

granted at 2018-01-20 08:57:24<br>

<br>

If we try to run clear-locks we get following<br>

error<br>

message:<br>

# gluster volume clear-locks<br>

zone2-ssd1-vmstor1<br>

<br>

/.shard/75353c17-d6b8-485d-9ba<wbr>f-fd6c700e39a1.21<br>

kind<br>

all inode<br>

Volume clear-locks unsuccessful<br>

clear-locks getxattr command failed. Reason:<br>

Operation not<br>

permitted<br>

<br>

Gluster vol info if needed:<br>

Volume Name: zone2-ssd1-vmstor1<br>

Type: Replicate<br>

Volume ID:<br>

b6319968-690b-4060-8fff-b212d2<wbr>295208<br>

Status: Started<br>

Snapshot Count: 0<br>

Number of Bricks: 1 x 2 = 2<br>

Transport-type: rdma<br>

Bricks:<br>

Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1<wbr>/export<br>

Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1<wbr>/export<br>

Options Reconfigured:<br>

cluster.shd-wait-qlength: 10000<br>

cluster.shd-max-threads: 8<br>

cluster.locking-scheme: granular<br>

performance.low-prio-threads: 32<br>

cluster.data-self-heal-algorit<wbr>hm: full<br>

performance.client-io-threads: off<br>

storage.linux-aio: off<br>

performance.readdir-ahead: on<br>

client.event-threads: 16<br>

server.event-threads: 16<br>

performance.strict-write-order<wbr>ing: off<br>

performance.quick-read: off<br>

performance.read-ahead: on<br>

performance.io-cache: off<br>

performance.stat-prefetch: off<br>

cluster.eager-lock: enable<br>

network.remote-dio: on<br>

cluster.quorum-type: none<br>

network.ping-timeout: 22<br>

performance.write-behind: off<br>

nfs.disable: on<br>

features.shard: on<br>

features.shard-block-size: 512MB<br>

storage.owner-uid: 36<br>

storage.owner-gid: 36<br>

performance.io-thread-count: 64<br>

performance.cache-size: 2048MB<br>

performance.write-behind-windo<wbr>w-size: 256MB<br>

server.allow-insecure: on<br>

cluster.ensure-durability: off<br>

config.transport: rdma<br>

server.outstanding-rpc-limit: 512<br>

diagnostics.brick-log-level: INFO<br>

<br>

Any recommendations how to advance from here?<br>

<br>

Best regards,<br>

Samuli Heinonen<br>

<br>

<br>

______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

&lt;mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.<wbr>org</a>&gt;<br>

&lt;mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.<wbr>org</a><br>

<br>

&lt;mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.<wbr>org</a>&gt;&gt;<br>

<br>

<br>

</div></div><a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a> [3]<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]&gt;<br>

[3]<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]&gt;<span class="gmail-"><br>

[3]&gt;<br>

[1]<br>

<br>

<br>

______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

&lt;mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.<wbr>org</a>&gt;<br>

&lt;mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.<wbr>org</a><br>

<br>

&lt;mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.<wbr>org</a>&gt;&gt;<br>

<br>

<br>

</span><a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a> [3]<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]&gt;<br>

[3]<br>

<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]<br>

<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]&gt;<span class="gmail-"><br>

[3]&gt; [1]<br>

<br>

--<br>

<br>

Pranith<br>

<br>

Links:<br>

------<br>

[1]<br>

<br>

</span><a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a> [3]<br>

<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]&gt;<br>

[3]<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]<br>

<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]&gt;<div><div class="gmail-h5"><br>

[3]&gt;<br>

<br>

--<br>

Pranith<br>

Samuli Heinonen &lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a><br>

&lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;<wbr>&gt;<br>

21 January 2018 at 21.03<br>

Hi again,<br>

<br>

here is more information regarding issue<br>

described<br>

earlier<br>

<br>

It looks like self healing is stuck. According<br>

to &quot;heal<br>

statistics&quot; crawl began at Sat Jan 20 12:56:19<br>

2018<br>

and it&#39;s still<br>

going on (It&#39;s around Sun Jan 21 20:30 when<br>

writing<br>

this). However<br>

glustershd.log says that last heal was<br>

completed at<br>

&quot;2018-01-20<br>

11:00:13.090697&quot; (which is 13:00 UTC+2). Also<br>

&quot;heal<br>

info&quot; has been<br>

running now for over 16 hours without any<br>

information. In<br>

statedump I can see that storage nodes have<br>

locks on<br>

files and<br>

some of those are blocked. Ie. Here again it<br>

says<br>

that ovirt8z2 is<br>

having active lock even ovirt8z2 crashed after<br>

the<br>

lock was<br>

granted.:<br>

<br>

<br>

[xlator.features.locks.zone2-s<wbr>sd1-vmstor1-locks.inode]<br>

<br>

path=/.shard/3d55f8cc-cda9-489<wbr>a-b0a3-fd0f43d67876.27<br>

mandatory=0<br>

inodelk-count=3<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0:self-<wbr>heal<br>

inodelk.inodelk[0](ACTIVE)=typ<wbr>e=WRITE,<br>

whence=0,<br>

start=0, len=0,<br>

pid = 18446744073709551610,<br>

owner=d0c6d857a87f0000,<br>

client=0x7f885845efa0,<br>

<br>

<br>

<br>

</div></div></blockquote>

<br>

</blockquote>

connection-id=sto2z2.xxx-10975<wbr>-2018/01/20-10:56:14:649541-<wbr>zone2-ssd1-vmstor1-client-0-0-<wbr>0,<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">

<br>

granted at 2018-01-20 10:59:52<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0:metad<wbr>ata<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0<br>

inodelk.inodelk[0](ACTIVE)=typ<wbr>e=WRITE,<br>

whence=0,<br>

start=0, len=0,<br>

pid = 3420, owner=d8b9372c397f0000,<br>

client=0x7f8858410be0,<br></div></div>

connection-id=<a href="http://ovirt8z2.xxx.com" rel="noreferrer" target="_blank">ovirt8z2.xxx.com</a> [1]<br>

&lt;<a href="http://ovirt8z2.xxx.com" rel="noreferrer" target="_blank">http://ovirt8z2.xxx.com</a>&gt;<br>

<br>

<br>

<br>

</blockquote><div><div class="gmail-h5">

[1]-5652-2017/12/27-09:49:02:9<wbr>46825-zone2-ssd1-vmstor1-clien<wbr>t-0-7-0,<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

granted at 2018-01-20 08:57:23<br>

inodelk.inodelk[1](BLOCKED)=ty<wbr>pe=WRITE,<br>

whence=0,<br>

start=0, len=0,<br>

pid = 18446744073709551610,<br>

owner=d0c6d857a87f0000,<br>

client=0x7f885845efa0,<br>

<br>

<br>

<br>

</blockquote>

<br>

</div></div></blockquote>

connection-id=sto2z2.xxx-10975<wbr>-2018/01/20-10:56:14:649541-<wbr>zone2-ssd1-vmstor1-client-0-0-<wbr>0,<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">

<br>

blocked at 2018-01-20 10:59:52<br>

<br>

I&#39;d also like to add that volume had arbiter<br>

brick<br>

before crash<br>

happened. We decided to remove it because we<br>

thought<br>

that it was<br>

causing issues. However now I think that this<br>

was<br>

unnecessary.<br>

After the crash arbiter logs had lots of<br>

messages<br>

like this:<br>

[2018-01-20 10:19:36.515717] I [MSGID: 115072]<br>

[server-rpc-fops.c:1640:server<wbr>_setattr_cbk]<br>

0-zone2-ssd1-vmstor1-server: 37374187: SETATTR<br>

&lt;gfid:a52055bd-e2e9-42dd-92a3-<wbr>e96b693bcafe&gt;<br>

(a52055bd-e2e9-42dd-92a3-e96b6<wbr>93bcafe) ==&gt;<br>

(Operation not<br>

permitted) [Operation not permitted]<br>

<br>

Is there anyways to force self heal to stop?<br>

Any<br>

help would be<br>

very much appreciated :)<br>

<br>

Best regards,<br>

Samuli Heinonen<br>

<br>

<br>

______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

&lt;mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.<wbr>org</a>&gt;<br>

<br>

</div></div><a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a> [3]<br>

<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]&gt;<div><div class="gmail-h5"><br>

[3]<br>

<br>

Samuli Heinonen &lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a><br>

&lt;mailto:<a href="mailto:samppah@neutraali.net" target="_blank">samppah@neutraali.net</a>&gt;<wbr>&gt;<br>

<br>

20 January 2018 at 21.57<br>

Hi all!<br>

<br>

One hypervisor on our virtualization<br>

environment<br>

crashed and now<br>

some of the VM images cannot be accessed.<br>

After<br>

investigation we<br>

found out that there was lots of images that<br>

still<br>

had active lock<br>

on crashed hypervisor. We were able to remove<br>

locks<br>

from &quot;regular<br>

files&quot;, but it doesn&#39;t seem possible to remove<br>

locks<br>

from shards.<br>

<br>

We are running GlusterFS 3.8.15 on all nodes.<br>

<br>

Here is part of statedump that shows shard<br>

having<br>

active lock on<br>

crashed node:<br>

<br>

[xlator.features.locks.zone2-s<wbr>sd1-vmstor1-locks.inode]<br>

<br>

path=/.shard/75353c17-d6b8-485<wbr>d-9baf-fd6c700e39a1.21<br>

mandatory=0<br>

inodelk-count=1<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0:metad<wbr>ata<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0:self-<wbr>heal<br>

<br>

lock-dump.domain.domain=zone2-<wbr>ssd1-vmstor1-replicate-0<br>

inodelk.inodelk[0](ACTIVE)=typ<wbr>e=WRITE,<br>

whence=0,<br>

start=0, len=0,<br>

pid = 3568, owner=14ce372c397f0000,<br>

client=0x7f3198388770,<br>

connection-id<br>

<br>

<br>

<br>

</div></div></blockquote>

<br>

</blockquote>

ovirt8z2.xxx-5652-2017/12/27-0<wbr>9:49:02:946825-zone2-ssd1-vmst<wbr>or1-client-1-7-0,<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">

<br>

granted at 2018-01-20 08:57:24<br>

<br>

If we try to run clear-locks we get following<br>

error<br>

message:<br>

# gluster volume clear-locks<br>

zone2-ssd1-vmstor1<br>

<br>

/.shard/75353c17-d6b8-485d-9ba<wbr>f-fd6c700e39a1.21 kind<br>

all inode<br>

Volume clear-locks unsuccessful<br>

clear-locks getxattr command failed. Reason:<br>

Operation not<br>

permitted<br>

<br>

Gluster vol info if needed:<br>

Volume Name: zone2-ssd1-vmstor1<br>

Type: Replicate<br>

Volume ID:<br>

b6319968-690b-4060-8fff-b212d2<wbr>295208<br>

Status: Started<br>

Snapshot Count: 0<br>

Number of Bricks: 1 x 2 = 2<br>

Transport-type: rdma<br>

Bricks:<br>

Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1<wbr>/export<br>

Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1<wbr>/export<br>

Options Reconfigured:<br>

cluster.shd-wait-qlength: 10000<br>

cluster.shd-max-threads: 8<br>

cluster.locking-scheme: granular<br>

performance.low-prio-threads: 32<br>

cluster.data-self-heal-algorit<wbr>hm: full<br>

performance.client-io-threads: off<br>

storage.linux-aio: off<br>

performance.readdir-ahead: on<br>

client.event-threads: 16<br>

server.event-threads: 16<br>

performance.strict-write-order<wbr>ing: off<br>

performance.quick-read: off<br>

performance.read-ahead: on<br>

performance.io-cache: off<br>

performance.stat-prefetch: off<br>

cluster.eager-lock: enable<br>

network.remote-dio: on<br>

cluster.quorum-type: none<br>

network.ping-timeout: 22<br>

performance.write-behind: off<br>

nfs.disable: on<br>

features.shard: on<br>

features.shard-block-size: 512MB<br>

storage.owner-uid: 36<br>

storage.owner-gid: 36<br>

performance.io-thread-count: 64<br>

performance.cache-size: 2048MB<br>

performance.write-behind-windo<wbr>w-size: 256MB<br>

server.allow-insecure: on<br>

cluster.ensure-durability: off<br>

config.transport: rdma<br>

server.outstanding-rpc-limit: 512<br>

diagnostics.brick-log-level: INFO<br>

<br>

Any recommendations how to advance from here?<br>

<br>

Best regards,<br>

Samuli Heinonen<br>

<br>

<br>

______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

&lt;mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.<wbr>org</a>&gt;<br>

<br>

</div></div><a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a> [3]<br>

<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]&gt;<span class="gmail-"><br>

[3]<br>

<br>

--<br>

<br>

Pranith<br>

<br>

Links:<br>

------<br>

[1] <a href="http://ovirt8z2.xxx.com" rel="noreferrer" target="_blank">http://ovirt8z2.xxx.com</a><br></span>

[2] <a href="http://review.gluster.org/14816" rel="noreferrer" target="_blank">http://review.gluster.org/1481<wbr>6</a> [2]<br>

&lt;<a href="http://review.gluster.org/14816" rel="noreferrer" target="_blank">http://review.gluster.org/148<wbr>16</a> [2]&gt;<br>

[3]<br>

<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a> [3]<br>

<br>

&lt;<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mail<wbr>man/listinfo/gluster-users</a> [3]&gt;<br>

<br>

--<br>

Pranith<br>

</blockquote></blockquote><span class="gmail-">

<br>

<br>

<br>

Links:<br>

------<br>

[1] <a href="http://ovirt8z2.xxx.com" rel="noreferrer" target="_blank">http://ovirt8z2.xxx.com</a><br>

[2] <a href="http://review.gluster.org/14816" rel="noreferrer" target="_blank">http://review.gluster.org/1481<wbr>6</a><br></span>

[3] <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a><br>

</blockquote>

</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr">Pranith<br></div></div>

</div></div>