[Gluster-users] Blocked inodes locking on some files

Florian Philippon florian.philippon at gmail.com
Thu Oct 27 09:29:56 UTC 2016


Hello community,

I'm still facing this issue using GlusterFS-3.8.5-1 and i would like to
know if it could be related to a potential bug in the disperse mode or if
it's related to the way of Spark writing results of
jobs in gluster mountpoint.

Found the following locks using statedump:

[xlator.features.locks.exp-locks.inode]
path=/tests/output/_temporary/0/_temporary
mandatory=0
inodelk-count=3
lock-dump.domain.domain=dht.layout.heal
lock-dump.domain.domain=exp-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
10978, owner=60af7c2a0c7f0000, client=0x7f4fd4009900,
connection-id=server1-5422-2016/10/26-15:03:03:550732-exp-client-0-0-0,
blocked at 2016-10-26 15:53:13, granted at 2016-10-26 15:53:14
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
19137, owner=18590414617f0000, client=0x7f4fdc0e1950,
connection-id=server2-1720-2016/10/26-15:05:59:328008-exp-client-0-0-0,
blocked at 2016-10-26 15:53:14
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
9798, owner=303b34a4e8720000, client=0x7f4fdc0deac0,
connection-id=server1-3955-2016/10/26-15:04:00:874305-exp-client-0-0-0,
blocked at 2016-10-26 15:53:18

Many thanks for your advices and help on this topic.


2016-06-20 10:43 GMT+02:00 Florian Philippon <florian.philippon at gmail.com>:

> Hello guys,
>
> I would like to get some advices on a some problems we have on our 3 hosts
> gluster setup.
>
> Here the setup used:
>
>    1. GlusterFS 3.8.0-1 (we did an upgrade from 3.7.11 last week)
>    2. Type: Disperse
>    3. Number of Bricks: 1 x (2 + 1) = 3
>    4. Transport-type: tcp
>    5. Options Reconfigured: transport.address-family: inet
>
> Please note that we also have the ACL option enabled on the volume mount.
>
> Use case:
>
> An user submit jobs/tasks to a Spark cluster which have the glusterfs
> volume mounted on each host.
> 13 tasks were successfully completed in ~30 min for each  (convert some
> logs to a json format and write the ouput to the gluster fs) but one was
> blocked for more than 12 hours when we checked
> was going wrong.
>
> We found some log entries related to an inode locking in the brick log one
> one host:
>
> [2016-06-19 03:15:08.563397] E [inodelk.c:304:__inode_unlock_lock]
> 0-exp-locks:  Matching lock not found for unlock 0-9223372036854775807, by
> 10613ebc6c6a0000 on 0x6cee5c0f4730
> [2016-06-19 03:15:08.563684] E [MSGID: 115053]
> [server-rpc-fops.c:273:server_inodelk_cbk] 0-exp-server: 5375861: INODELK
> /spark/user/20160328/_temporary/0/_temporary (015bde3a-09d
> 6-41a2-8e9f-7e7c5295d596) ==> (Invalid argument) [Invalid argument]
>
> Errors in the data log:
>
> [2016-06-19 03:13:29.198676] I [MSGID: 109036] [dht-common.c:8824:dht_log_new_layout_for_dir_selfheal]
> 0-exp-dht: Setting layout of /spark/user/20160328/_
> temporary/0/_temporary/at
> tempt_201606190511_0004_m_000004_26 with [Subvol_name: exp-disperse-0,
> Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
> [2016-06-19 03:14:59.349357] I [MSGID: 109066]
> [dht-rename.c:1562:dht_rename] 0-exp-dht: renaming /spark/user/20160328/_
> temporary/0/_temporary/attempt_201606190511_0004_m_000001_2
> 3 (hash=exp-disperse-0/cache=exp-disperse-0) => /spark/user/20160328/_
> temporary/0/task_201606190511_0004_m_000001 (hash=exp-disperse-0/cache=<
> nul>)
>
> And these entries are also spamming the data log when an action is done
> the fs:
>
> [2016-06-19 13:58:22.817308] I [dict.c:462:dict_get]
> (-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628)
> [0x6f0655cd1628] -->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s
> o(+0x9ccb) [0x6f0655ab5ccb] -->/lib64/libglusterfs.so.0(dict_get+0xec)
> [0x6f066528df7c] ) 0-dict: !this || key=system.posix_acl_access [Invalid
> argument]
> [2016-06-19 13:58:22.817364] I [dict.c:462:dict_get]
> (-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628)
> [0x6f0655cd1628] -->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s
> o(+0x9d21) [0x6f0655ab5d21] -->/lib64/libglusterfs.so.0(dict_get+0xec)
> [0x6f066528df7c] ) 0-dict: !this || key=system.posix_acl_default [Invalid
> argument]
>
> We did a stadump and we got confirmation that some processes were in a
> blocking state.
>
> We did a clear lock on the blocked inode and the spark job has finally
> finished (with errors).
>
> What could be the root cause of these lockings?
>
> Thanks for your help!
>
> Florian
>
>
>


-- 
Florian Philippon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161027/463774aa/attachment.html>


More information about the Gluster-users mailing list