[Gluster-users] NFS Locking (Very complicated)

Service Mail ciclopeblu at gmail.com
Thu Jan 19 21:23:41 UTC 2017


Hello,

I have a cluster with 2 nodes that I want to access using NFS, there is no
firewall between the server.
I'm using gluster 3.9.0

The volumes mount correctly on a client machine,  even if I get a strange
"portmap query retrying: RPC: Program not registered"

[root at nf1 ~]# mount -vvv -t nfs -o nfsvers=3,rw
gluster-nfs-a:/replicated-data /export/replicated-data
mount: fstab path: "/etc/fstab"
.........
mount: external mount: argv[5] = "rw,nfsvers=3"
mount.nfs: timeout set for Thu Jan 19 21:51:05 2017
mount.nfs: trying text-based options 'nfsvers=3,addr=10.1.0.1'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.4.0.28 prog 100003 vers 3 prot TCP port 2049
.......
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: portmap query retrying: RPC: Program not registered
.......
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 10.4.0.28 prog 100005 vers 3 prot TCP port 38465
gluster-nfs-a:/replicated-data on /export/replicated-data type nfs
(rw,nfsvers=3)




When I try to lock a file on the client with something like:

flock -x -w 10 100 file (...)

the lock doesn't work and I get those line on the server nfs.log:

[2017-01-19 20:03:03.621556] I [rpc-clnt.c:1946:rpc_clnt_reconfig]
0-replicated-data-client-2: changing port to 49159 (from 0)
[2017-01-19 20:03:03.626031] I [MSGID: 114057]
[client-handshake.c:1446:select_server_supported_programs]
0-replicated-data-client-0: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2017-01-19 20:03:03.626661] I [MSGID: 114046]
[client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-0:
Connected to replicated-data-client-0, attached to remote volume
'/gluster-nfs-zpool/replicated-data'.
[2017-01-19 20:03:03.626693] I [MSGID: 114047]
[client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-0:
Server and Client lk-version numbers are not same, reopening the fds
[2017-01-19 20:03:03.626803] I [MSGID: 108005]
[afr-common.c:4429:afr_notify] 0-replicated-data-replicate-0: Subvolume
'replicated-data-client-0' came back up; going online.
[2017-01-19 20:03:03.626998] I [MSGID: 114035]
[client-handshake.c:201:client_set_lk_version_cbk]
0-replicated-data-client-0: Server lk version = 1
[2017-01-19 20:03:03.630498] I [MSGID: 114057]
[client-handshake.c:1446:select_server_supported_programs]
0-replicated-data-client-2: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2017-01-19 20:03:03.631152] I [MSGID: 114046]
[client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-2:
Connected to replicated-data-client-2, attached to remote volume
'/gluster-nfs-zpool/replicated-data'.
[2017-01-19 20:03:03.631180] I [MSGID: 114047]
[client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-2:
Server and Client lk-version numbers are not same, reopening the fds
[2017-01-19 20:03:03.631651] I [MSGID: 114035]
[client-handshake.c:201:client_set_lk_version_cbk]
0-replicated-data-client-2: Server lk version = 1
[2017-01-19 20:03:03.632563] I [MSGID: 108031]
[afr-common.c:2178:afr_local_discovery_cbk] 0-replicated-data-replicate-0:
selecting local read_child replicated-data-client-0
[2017-01-19 20:03:29.463006] W [MSGID: 112165]
[nlm4.c:1484:nlm4svc_lock_common] 0-nfs-NLM: NLM in grace period
[2017-01-19 20:03:32.742975] W [MSGID: 112165] [nlm4.c:1804:nlm4svc_unlock]
0-nfs-NLM: NLM in grace period
The message "W [MSGID: 112165] [nlm4.c:1804:nlm4svc_unlock] 0-nfs-NLM: NLM
in grace period" repeated 4 times between [2017-01-19 20:03:32.742975] and
[2017-01-19 20:03:52.742771]
[2017-01-19 20:03:57.743581] W [MSGID: 112057]
[nlm4.c:1744:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[No locks available]
[2017-01-19 20:03:57.743652] W [MSGID: 112122]
[nlm4.c:1759:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[Operation not permitted]
[2017-01-19 20:05:03.237220] E [MSGID: 112163] [nlm4.c:570:nsm_monitor]
0-nfs-NLM: clnt_call(): RPC: Success
[2017-01-19 20:06:10.566790] C
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired]
0-replicated-data-client-2: server 10.1.0.33:49159 has not responded in the
last 42 seconds, disconnecting.
[2017-01-19 20:06:13.380796] I [socket.c:3396:socket_submit_request]
0-replicated-data-client-2: not connected (priv->connected = -1)
[2017-01-19 20:06:13.380830] W [rpc-clnt.c:1639:rpc_clnt_submit]
0-replicated-data-client-2: failed to submit rpc-request (XID: 0x29
Program: GlusterFS 3.3, ProgVers: 330, Proc: 26) to rpc-transport
(replicated-data-client-2)
[2017-01-19 20:06:13.380846] W [MSGID: 114031]
[client-rpc-fops.c:2529:client3_3_lk_cbk] 0-replicated-data-client-2:
remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:06:45.739806] W [rpc-clnt.c:1639:rpc_clnt_submit]
0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2a
Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport
(replicated-data-client-2)
[2017-01-19 20:07:45.651815] W [MSGID: 114031]
[client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2:
remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:07:47.535963] W [rpc-clnt.c:1639:rpc_clnt_submit]
0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2d
Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport
(replicated-data-client-2)
[2017-01-19 20:08:45.764782] W [rpc-clnt.c:1639:rpc_clnt_submit]
0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2e
Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport
(replicated-data-client-2)
[2017-01-19 20:08:47.553021] W [rpc-clnt.c:1639:rpc_clnt_submit]
0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2f
Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport
(replicated-data-client-2)
The message "W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk]
0-replicated-data-client-2: remote operation failed [Transport endpoint is
not connected]" repeated 3 times between [2017-01-19 20:07:45.651815] and
[2017-01-19 20:08:47.553057]
[2017-01-19 20:09:25.952509] E [MSGID: 112167]
[nlm4.c:1009:nlm4_establish_callback] 0-nfs-NLM: Unable to get NLM port of
the client. Is the firewall running on client? OR Are RPC services running
(rpcinfo -p)?
[2017-01-19 20:09:25.953149] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] )))))
0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3)
op(LK(26)) called at 2017-01-19 20:05:13.206864 (xid=0x23)
[2017-01-19 20:09:25.953181] W [MSGID: 114031]
[client-rpc-fops.c:2529:client3_3_lk_cbk] 0-replicated-data-client-2:
remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:09:25.953845] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] )))))
0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3)
op(LK(26)) called at 2017-01-19 20:05:23.235063 (xid=0x24)
[2017-01-19 20:09:25.954461] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] )))))
0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3)
op(LK(26)) called at 2017-01-19 20:05:43.299167 (xid=0x26)
[2017-01-19 20:09:25.955045] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] )))))
0-replicated-data-client-2: forced unwinding frame type(GF-DUMP)
op(NULL(2)) called at 2017-01-19 20:05:28.562661 (xid=0x25)
[2017-01-19 20:09:25.955077] W [rpc-clnt-ping.c:202:rpc_clnt_ping_cbk]
0-replicated-data-client-2: socket disconnected
[2017-01-19 20:09:25.955567] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] )))))
0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3)
op(STATFS(14)) called at 2017-01-19 20:05:45.774191 (xid=0x27)
[2017-01-19 20:09:25.955612] W [MSGID: 114031]
[client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2:
remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:09:25.956246] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] )))))
0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3)
op(STATFS(14)) called at 2017-01-19 20:05:47.549472 (xid=0x28)
[2017-01-19 20:09:25.956373] I [MSGID: 114018]
[client.c:2280:client_rpc_notify] 0-replicated-data-client-2: disconnected
from replicated-data-client-2. Client process will keep trying to connect
to glusterd until brick's port is available
[2017-01-19 20:09:36.592738] I [rpc-clnt.c:1946:rpc_clnt_reconfig]
0-replicated-data-client-2: changing port to 49159 (from 0)
[2017-01-19 20:09:36.597437] I [MSGID: 114057]
[client-handshake.c:1446:select_server_supported_programs]
0-replicated-data-client-2: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2017-01-19 20:09:36.598036] I [MSGID: 114046]
[client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-2:
Connected to replicated-data-client-2, attached to remote volume
'/gluster-nfs-zpool/replicated-data'.
[2017-01-19 20:09:36.598059] I [MSGID: 114047]
[client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-2:
Server and Client lk-version numbers are not same, reopening the fds
[2017-01-19 20:09:36.598072] I [MSGID: 114042]
[client-handshake.c:1053:client_post_handshake] 0-replicated-data-client-2:
1 fds open - Delaying child_up until they are re-opened
[2017-01-19 20:09:36.598343] I [MSGID: 114041]
[client-handshake.c:675:client_child_up_reopen_done]
0-replicated-data-client-2: last fd open'd/lock-self-heal'd - notifying
CHILD-UP
[2017-01-19 20:09:36.598036] I [MSGID: 114046]
[client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-2:
Connected to replicated-data-client-2, attached to remote volume
'/gluster-nfs-zpool/replicated-data'.
[2017-01-19 20:09:36.598059] I [MSGID: 114047]
[client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-2:
Server and Client lk-version numbers are not same, reopening the fds
[2017-01-19 20:09:36.598072] I [MSGID: 114042]
[client-handshake.c:1053:client_post_handshake] 0-replicated-data-client-2:
1 fds open - Delaying child_up until they are re-opened
[2017-01-19 20:09:36.598343] I [MSGID: 114041]
[client-handshake.c:675:client_child_up_reopen_done]
0-replicated-data-client-2: last fd open'd/lock-self-heal'd - notifying
CHILD-UP
[2017-01-19 20:09:36.598506] I [MSGID: 114035]
[client-handshake.c:201:client_set_lk_version_cbk]
0-replicated-data-client-2: Server lk version = 1
[2017-01-19 20:10:47.538183] I [MSGID: 108031]
[afr-common.c:2178:afr_local_discovery_cbk] 0-replicated-data-replicate-0:
selecting local read_child replicated-data-client-0
The message "W [MSGID: 114031] [client-rpc-fops.c:2529:client3_3_lk_cbk]
0-replicated-data-client-2: remote operation failed [Transport endpoint is
not connected]" repeated 2 times between [2017-01-19 20:09:25.953181] and
[2017-01-19 20:09:25.954495]
[2017-01-19 20:09:25.956278] W [MSGID: 114031]
[client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2:
remote operation failed [Transport endpoint is not connected]
[2017-01-19 20:14:18.649616] W [socket.c:588:__socket_rwv] 0-NLM-client:
readv on 10.4.0.13:32803 failed (No data available)
[2017-01-19 20:26:18.648786] I [MSGID: 101053]
[mem-pool.c:638:mem_pool_destroy] 0-nfs-server: size=588 max=1 total=3
[2017-01-19 20:26:18.648831] I [MSGID: 101053]
[mem-pool.c:638:mem_pool_destroy] 0-nfs-server: size=124 max=1 total=3


As I said there is no firewall and all the ports listed here are open and
reachable from the client on both replica servers:


   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100005    3   tcp  38465  mountd
    100005    1   tcp  38466  mountd
    100003    3   tcp   2049  nfs
    100024    1   udp  53548  status
    100024    1   tcp  35281  status
    100021    4   tcp  38468  nlockmgr
    100021    1   udp    914  nlockmgr
    100227    3   tcp   2049  nfs_acl
    100021    1   tcp    917  nlockmgr


Also I don't understand this:

[2017-01-19 20:03:57.743581] W [MSGID: 112057]
[nlm4.c:1744:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[No locks available]
[2017-01-19 20:03:57.743652] W [MSGID: 112122]
[nlm4.c:1759:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[Operation not permitted]
[2017-01-19 20:05:03.237220] E [MSGID: 112163] [nlm4.c:570:nsm_monitor]
0-nfs-NLM: clnt_call(): RPC: Success
[2017-01-19 20:06:10.566790] C
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired]
0-replicated-data-client-2: server 10.1.0.33:49159 has not responded in the
last 42 seconds, disconnecting.


Why the server I'm connected is trying to connected to the replica:
0-replicated-data-client-2: server 10.4.0.33:49159 ?


And this:

[2017-01-19 20:08:47.553021] W [rpc-clnt.c:1639:rpc_clnt_submit]
0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2f
Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport
(replicated-data-client-2)
The message "W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk]
0-replicated-data-client-2: remote operation failed [Transport endpoint is
not connected]" repeated 3 times between [2017-01-19 20:07:45.651815] and
[2017-01-19 20:08:47.553057]
[2017-01-19 20:09:25.952509] E [MSGID: 112167]
[nlm4.c:1009:nlm4_establish_callback] 0-nfs-NLM: Unable to get NLM port of
the client. Is the firewall running on client? OR Are RPC services running
(rpcinfo -p)?

Is here talking about the client trying to lock a file in the volume,  or
actually the other replica server on 10.1.0.33 ?

Any idea?

Thanks,

C
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170119/f1849a91/attachment.html>


More information about the Gluster-users mailing list