<div dir="ltr">We use Mellanox Infiniband card to create an IB cluster. There are several storage nodes, and more than 20 clients. GlusterFS version is 3.11. The storage OS is CentOS 6.5, and the client OS is CentOS 7.3. Previously we used IP over IB and everything was OK. After we use RDMA, we get higher bandwidth, but we often see some brick disconnecting messages in client logs, and we can't see abnormal things in brick logs at the same time. Although all bricks are reconnected finally, this problem leads to some serious problems. For example, it takes several minutes to run a simple "ls" or "df" command.<div><br></div><div>Here is an example of brick disconnecting log on one client:</div><div><div>[2017-12-21 10:45:47.476597] C [rpc-clnt-ping.c:186:rpc_clnt_ping_timer_expired] 0-data-client-129: server <a href="http://10.0.0.35:49204">10.0.0.35:49204</a> has not responded in the last 60 seconds, disconnecting.(trans1:0,trans2:0)</div><div>[2017-12-21 10:45:47.478820] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-data-client-129: disconnected from data-client-129. Client process will keep trying to connect to glusterd until brick's port is available</div><div>[2017-12-21 10:45:47.479267] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f565546230b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f56552279fe] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5655227b0e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f5655229280] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f5655229d30] ))))) 0-data-client-129: forced unwinding frame type(GlusterFS 3.3) op(ENTRYLK(31)) called at 2017-12-23 10:43:52.887616 (xid=0x9da3f5)</div><div>[2017-12-21 10:45:47.479317] E [MSGID: 114031] [client-rpc-fops.c:1646:client3_3_entrylk_cbk] 0-data-client-129: remote operation failed [Transport endpoint is not connected]</div><div>[2017-12-21 10:45:47.479352] E [MSGID: 108007] [afr-lk-common.c:825:afr_unlock_entrylk_cbk] 0-data-replicate-64: /data/a3581.data: unlock failed on data-client-129 [Transport endpoint is not connected]</div><div>[2017-12-21 10:45:47.479718] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f565546230b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f56552279fe] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5655227b0e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f5655229280] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f5655229d30] ))))) 0-data-client-129: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2017-12-23 10:43:53.249305 (xid=0x9da3f6)</div><div>[2017-12-21 10:45:47.479771] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data-client-129: remote operation failed. Path: /data/b07869.data (fe89d36e-16b8-4b06-bd36-69023217db9f) [Transport endpoint is not connected]</div><div>[2017-12-21 10:45:47.480644] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f565546230b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f56552279fe] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f5655227b0e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f5655229280] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f5655229d30] ))))) 0-data-client-129: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2017-12-23 10:44:47.468222 (xid=0x9da3f7)</div><div>[2017-12-21 10:45:47.480682] W [rpc-clnt-ping.c:243:rpc_clnt_ping_cbk] 0-data-client-129: socket disconnected</div><div>[2017-12-21 10:45:47.481046] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data-client-129: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected]</div><div>[2017-12-21 10:45:58.497609] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-data-client-129: changing port to 49204 (from 0)</div><div>[2017-12-21 10:45:58.512289] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-data-client-129: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div><div>[2017-12-21 10:45:58.517383] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-data-client-129: Connected to data-client-129, attached to remote volume '/disks/xnuyUF3N/brick'..<br></div></div><div><br></div><div><br></div><div><div>We find the the hw counter of rq_num_rnr for IB card in some clients is very big:</div><div>#cat /sys/class/infiniband/mlx4_0/ports/1/hw_counters/rq_num_rnr</div><div>943004905</div><div><br></div><div>And the corresponding value in storage node is also big:</div><div>cat /sys/class/infiniband/mlx4_0/ports/1/hw_counters/rq_num_rnr</div><div>23193068</div></div><div><br></div><div>If we use IP over IB, the counter value is 0. And on some clients in which we don't see brick disconnecting problems, we also see zero value of rq_num_rnr.</div><div><br></div><div>We guess it's a flow control problem of RDMA. One side sends data so fast and the other side can't receive them in time, and then rq_num_rnr increases.</div><div><br></div><div>Does RDMA support flow control in GlusterFS now?</div><div><br></div><div>And can we adjust these macros defined in rdma.h to avoid this problem?</div><div><pre class="gmail-code gmail-highlight" style="box-sizing:border-box;overflow-x:auto;overflow-y:hidden;font-family:Menlo,"Liberation Mono",Consolas,"DejaVu Sans Mono","Ubuntu Mono","Courier New","andale mono","lucida console",monospace;padding:10px;margin-top:0px;margin-bottom:0px;word-break:break-all;word-wrap:normal;color:rgb(51,51,51);border-top:none;border-right:none;border-bottom:none;border-left:1px solid rgb(187,187,187);border-radius:0px;font-size:13px;line-height:1.5"><code style="box-sizing:border-box;font-family:Menlo,"Liberation Mono",Consolas,"DejaVu Sans Mono","Ubuntu Mono","Courier New","andale mono","lucida console",monospace;font-size:inherit;padding:0px;color:inherit;background-color:transparent;border-radius:0px;word-wrap:normal"><span id="gmail-LC41" class="gmail-line" style="box-sizing:border-box;display:inline"><span class="gmail-cm" style="box-sizing:border-box;color:rgb(153,153,136);font-style:italic">/* Additional attributes */</span></span>
<span id="gmail-LC42" class="gmail-line" style="box-sizing:border-box;display:inline"><span class="gmail-cp" style="box-sizing:border-box;color:rgb(153,153,153);font-weight:bold">#define GF_RDMA_TIMEOUT 14</span>
<span id="gmail-LC43" class="gmail-line" style="box-sizing:border-box;display:inline">#define GF_RDMA_RETRY_CNT 7</span>
<span id="gmail-LC44" class="gmail-line" style="box-sizing:border-box;display:inline">#define GF_RDMA_RNR_RETRY 7</span></span></code></pre></div><div><br></div><div><br></div></div>