[Gluster-users] NFS timeouts?

Thu Dec 1 12:34:43 UTC 2016

Le 01/12/2016 à 13:12, Yannick Perret a écrit :
> Hello,
> I have a client machine that mounts as NFS a replicate x2 volume. 
> Practicaly this is configured with automount such as:
> DIR-NAME -rw,soft,intr server1,server2:/VOLUME
>
> Gluster servers are using 3.6.7.
> Sometimes the NFS blocks on client with
> server server2 not responding, timed out  (here it was connected on 
> server2)
> but network communication is fine beetween the two machines (they are 
> connected to the same switch, I can ssh on each, they ping each other…).
>
> I can also see few "xs_tcp_setup_socket: connect returned unhandled 
> error -107" on the client.
> On 'server2' side I can see in the gluster nfs logs:
>
> [2016-12-01 10:50:15.887927] W [rpcsvc.c:261:rpcsvc_program_actor] 
> 0-rpc-service: RPC program version not available (req 100003 2)
> [2016-12-01 10:50:15.887965] E 
> [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed 
> to complete successfully
> [2016-12-01 10:50:15.901880] W [rpcsvc.c:261:rpcsvc_program_actor] 
> 0-rpc-service: RPC program version not available (req 100003 4)
> [2016-12-01 10:50:15.901900] E 
> [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed 
> to complete successfully
> [2016-12-01 10:51:03.777145] W [rpcsvc.c:261:rpcsvc_program_actor] 
> 0-rpc-service: RPC program version not available (req 100003 2)
> [2016-12-01 10:51:03.777191] E 
> [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed 
> to complete successfully
> [2016-12-01 10:51:03.790561] W [rpcsvc.c:261:rpcsvc_program_actor] 
> 0-rpc-service: RPC program version not available (req 100003 4)
> [2016-12-01 10:51:03.790580] E 
> [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed 
> to complete successfully
>
It looks like these correspond to the NFS re-connection (client trying 
NFSv2 and NFSv4 I think).

Just before that here are the logs:
l_layout_new_directory] 0-HOME-LIRIS-dht: assigning range size 
0xffe76e40 to HOME-LIRIS-replicate-0
[2016-12-01 10:48:36.990028] W 
[client-rpc-fops.c:2145:client3_3_setattr_cbk] 0-HOME-LIRIS-client-1: 
remote operation failed: Opération non permise
[2016-12-01 10:48:36.990303] W 
[client-rpc-fops.c:2145:client3_3_setattr_cbk] 0-HOME-LIRIS-client-0: 
remote operation failed: Opération non permise
The message "I [MSGID: 109036] 
[dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 
0-HOME-LIRIS-dht: Setting layout of 
<gfid:6f8bb427-eea5-4dd5-b004-9db8582bdda2>/_indexer.lock with 
[Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 
4294967295 ], " repeated 2 times between [2016-12-01 10:48:36.404738] 
and [2016-12-01 10:48:36.949907]
[2016-12-01 10:48:36.990728] I [MSGID: 109036] 
[dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 
0-HOME-LIRIS-dht: Setting layout of 
<gfid:6f8bb427-eea5-4dd5-b004-9db8582bdda2>/39132555496bb098708af2d5e7b56d67 
with [Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 
4294967295 ],
[2016-12-01 10:50:10.360020] I [dht-rename.c:1344:dht_rename] 
0-HOME-LIRIS-dht: renaming 
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/tmp_km1NUe 
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0) => 
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/general.php 
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0)
[2016-12-01 10:50:10.423561] I [dht-rename.c:1344:dht_rename] 
0-HOME-LIRIS-dht: renaming 
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/tmp_2pOZ5T 
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0) => 
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/1.php 
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0)
[2016-12-01 10:50:10.485882] I [dht-rename.c:1344:dht_rename] 
0-HOME-LIRIS-dht: renaming 
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/tmp_86Lmpz 
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0) => 
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/general.php 
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0)

I also tried to set "nfs.mount-rmtab /dev/shm/glusterfs.rmtab" as I read 
on an old thread. Will check if it change something.

Regards,
--
Y.

> at a time that correspond to the NFS timeouts.
>
> This problem occurs "often" (at least each day or each 2 days), and 
> neither client nor servers are on heavy load (memory and CPU far to be 
> full).
>
> Any idea about what can be the reason and how to prevent it to occur?
> I reduced the autofs timeout in order to reduce impact but it is not a 
> very nice solution… Note: I can't use the glusterfs client instead of 
> NFS because of the memory leaks that still exist in it.
>
> Thanks.
>
> Regards,
> -- 
> Y.
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161201/7ffec564/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161201/7ffec564/attachment.p7s>