[Bugs] [Bug 1370683] New: "rpc_clnt_ping_timer_expired" when copying big file ( reproducible)

bugzilla at redhat.com bugzilla at redhat.com
Sat Aug 27 01:04:00 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1370683

            Bug ID: 1370683
           Summary: "rpc_clnt_ping_timer_expired" when copying big file
                    (reproducible)
           Product: GlusterFS
           Version: 3.7.11
         Component: fuse
          Assignee: bugs at gluster.org
          Reporter: kripper at imatronix.com
                CC: bugs at gluster.org



Description of problem:

I'm copying a 3.9 G file to a gluster mount (served on a remote host) and the
transfer is *always* interrupted after transfering 958 MB with a
"rpc_clnt_ping_timer_expired" error.

I thought it was a network problem, but after retrying 5 times with same
result, it seems to be a bug in GlusterFS.

Network admins confirmed they are not restricting the transfer. Besides,
transferring other big files doesn't trigger this problem.

Version-Release number of selected component (if applicable):

glusterfs-3.7.11-1.el7.x86_64
glusterfs-fuse-3.7.11-1.el7.x86_64
glusterfs-libs-3.7.11-1.el7.x86_64
glusterfs-client-xlators-3.7.11-1.el7.x86_64
glusterfs-cli-3.7.11-1.el7.x86_64
glusterfs-api-3.7.11-1.el7.x86_64
glusterfs-server-3.7.11-1.el7.x86_64

Linux h8.imatronix.com 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

CentOS Linux release 7.2.1511 (Core)

How reproducible:

cp -a big.qcow2 /mnt/backups/destfile'

Actual results:

Transfer is interrupted after aprox. 958 MB:

Test 1: transferred 980864 (x 1024 bytes)
Test 2: transferred 980992 (x 1024 bytes)
Other tests are very similar

Expected results:

Complete 'cp' process.

Additional info:

Using rsync instead of cp works, but dmesg reports:
kernel: INFO: task rsync:12452 blocked for more than 120 seconds.

I have been copying big files for half a year without this issue.
This is a production environment, so packages have not been updated all this
time.

I'm reporting because this seems to be the kind of bug that turns chronic
(stays present for many releases) since it looks more like a network problem
and because of the hassle to reproduce...but I'm able to reproduce and will
keep the offending file for some time in case it is needed.

Logs:

[2016-08-27 00:42:01.722293] C
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-backups-h9-client-0: server
<host-9>:49152 has not responded in the last 42 seconds, disconnecting.
[2016-08-27 00:42:01.724838] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13))
called at 2016-08-27 00:41:13.354197 (xid=0x5dfbd8)
[2016-08-27 00:42:01.724894] W [MSGID: 114031]
[client-rpc-fops.c:907:client3_3_writev_cbk] 0-backups-h9-client-0: remote
operation failed [Transport endpoint is not connected]
[2016-08-27 00:42:01.725230] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13))
called at 2016-08-27 00:41:13.453085 (xid=0x5dfbd9)
[2016-08-27 00:42:01.725524] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13))
called at 2016-08-27 00:41:13.454572 (xid=0x5dfbda)
[2016-08-27 00:42:01.725803] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13))
called at 2016-08-27 00:41:13.491924 (xid=0x5dfbdb)
[2016-08-27 00:42:01.726074] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13))
called at 2016-08-27 00:41:13.615236 (xid=0x5dfbdc)
[2016-08-27 00:42:01.726345] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13))
called at 2016-08-27 00:41:13.616659 (xid=0x5dfbdd)
[2016-08-27 00:42:01.726625] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13))
called at 2016-08-27 00:41:13.722719 (xid=0x5dfbde)
[2016-08-27 00:42:01.726897] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13))
called at 2016-08-27 00:41:13.724086 (xid=0x5dfbdf)
[2016-08-27 00:42:01.727159] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13))
called at 2016-08-27 00:41:13.761585 (xid=0x5dfbe0)
[2016-08-27 00:42:01.737321] I [socket.c:3304:socket_submit_request]
0-backups-h9-client-0: not connected (priv->connected = 0)
[2016-08-27 00:42:01.737364] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbe2 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.737675] W [fuse-bridge.c:2302:fuse_writev_cbk]
0-glusterfs-fuse: 9885708: WRITE => -1 (Transport endpoint is not connected)
[2016-08-27 00:42:01.737758] W [MSGID: 114029]
[client-rpc-fops.c:4415:client3_3_writev] 0-backups-h9-client-0: failed to send
the fop
[2016-08-27 00:42:01.738026] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f63bfca0ae2] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f63bfa6b90e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f63bfa6ba1e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f63bfa6d40a] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f63bfa6dc38] )))))
0-backups-h9-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called
at 2016-08-27 00:41:19.712717 (xid=0x5dfbe1)
[2016-08-27 00:42:01.738056] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk]
0-backups-h9-client-0: socket disconnected
[2016-08-27 00:42:01.738082] I [MSGID: 114018]
[client.c:2030:client_rpc_notify] 0-backups-h9-client-0: disconnected from
backups-h9-client-0. Client process will keep trying to connect to glusterd
until brick's port is available
[2016-08-27 00:42:01.738331] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbe3 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.738420] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbe4 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.738492] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbe5 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.738568] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbe6 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.738980] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbe7 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.739572] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbe8 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.739639] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbe9 Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.739680] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbea Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.739718] W [rpc-clnt.c:1606:rpc_clnt_submit]
0-backups-h9-client-0: failed to submit rpc-request (XID: 0x5dfbeb Program:
GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (backups-h9-client-0)
[2016-08-27 00:42:01.739760] W [fuse-bridge.c:1287:fuse_err_cbk]
0-glusterfs-fuse: 9885709: FLUSH() ERR => -1 (Transport endpoint is not
connected)
[2016-08-27 00:42:01.747159] I [rpc-clnt.c:1868:rpc_clnt_reconfig]
0-backups-h9-client-0: changing port to 49152 (from 0)
[2016-08-27 00:42:01.761133] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-backups-h9-client-0: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2016-08-27 00:42:01.763595] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-backups-h9-client-0: Connected
to backups-h9-client-0, attached to remote volume
'/home/datacenter/gluster-bricks/backups-h9'.
[2016-08-27 00:42:01.763639] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-backups-h9-client-0: Server
and Client lk-version numbers are not same, reopening the fds
[2016-08-27 00:42:01.763663] I [MSGID: 114042]
[client-handshake.c:1056:client_post_handshake] 0-backups-h9-client-0: 3 fds
open - Delaying child_up until they are re-opened
[2016-08-27 00:42:01.771345] I [MSGID: 114041]
[client-handshake.c:678:client_child_up_reopen_done] 0-backups-h9-client-0:
last fd open'd/lock-self-heal'd - notifying CHILD-UP
[2016-08-27 00:42:01.773563] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-backups-h9-client-0:
Server lk version = 1
The message "W [MSGID: 114031] [client-rpc-fops.c:907:client3_3_writev_cbk]
0-backups-h9-client-0: remote operation failed [Transport endpoint is not
connected]" repeated 18 times between [2016-08-27 00:42:01.724894] and
[2016-08-27 00:42:01.739734]
The message "W [MSGID: 114029] [client-rpc-fops.c:4415:client3_3_writev]
0-backups-h9-client-0: failed to send the fop" repeated 9 times between
[2016-08-27 00:42:01.737758] and [2016-08-27 00:42:01.739809]

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list