[Gluster-users] RSYNC files renaming issue and timeout errors

Mauro Tridici mauro.tridici at cmcc.it
Mon Feb 4 10:25:46 UTC 2019


Hi All,

our users are experiencing the following two problems using our gluster storage based on a 12x(4+2) distributed dispersed volume:

1) sometimes, during RSYNC copies executions, they noticed this error message:

rsync: rename "/tier2/CSP/sp1/ftp/lweprsn/.cmcc_CMCC-CM2-v20160423_hindcast_S2005040100_atmos_day_surface_lweprsn_n1-n40.sha256.rD38kX" -> "cmcc_CMCC-CM2-v20160423_hindcast_S2005040100_atmos_day_surface_lweprsn_n1-n40.sha256": Permission denied (13)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]

The files seem to be correctly copied but rsync returns this kind of error and stop the sync process.

2) if the number of the total rsync sessions against the gluster volume grows, it seems that gluster volume starts suffering some particular workload, directories tree navigation becomes very slow and a lot of timeout errors appear on 192.168.0.54 gluster server (named “s04")

I just checked switch, cables, ports and so on, but everything seems to be ok.
I also executed a lot of checks using iperf tool and the network seems to be ok.
I dont’ understand why I see only timeout errors related to s04 host (192.168.0.54). It is up and running…

It seems that rsync copy processes started by the gluster volume users cause some noise (on gluster client log file) during temporary files creation.

[2019-01-27 02:36:53.488942] E [socket.c:2376:socket_connect_finish] 0-tier2-client-57: connection to 192.168.0.54:49159 failed (Timeout della connessione); disconnecting socke
t
[2019-01-27 02:53:48.607832] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved
_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d
410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-57: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-01-2
7 02:53:10.568717 (xid=0x4aeefc8)
[2019-01-27 02:55:01.631912] E [socket.c:2376:socket_connect_finish] 0-tier2-client-54: connection to 192.168.0.54:49158 failed (Timeout della connessione); disconnecting socke
t
[2019-01-27 02:56:05.643880] E [socket.c:2376:socket_connect_finish] 0-tier2-client-54: connection to 192.168.0.54:49158 failed (Timeout della connessione); disconnecting socke
t
[2019-01-27 02:57:09.653961] E [socket.c:2376:socket_connect_finish] 0-tier2-client-54: connection to 192.168.0.54:49158 failed (Timeout della connessione); disconnecting socke
t

[2019-01-27 03:50:56.951088] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved
_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d
410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-54: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2019-01
-27 03:39:48.938568 (xid=0x49e54c1)
[2019-01-27 03:50:56.951111] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-54: remote operation failed [Transport endpoint is not connected]
[2019-01-27 03:50:56.951471] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved
_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d
410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-54: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-01-2
7 03:35:31.328387 (xid=0x49e54bf)
[2019-01-27 03:50:56.958357] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved
_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d
410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-54: forced unwinding frame type(GlusterFS 3.3) op(IPC(47)) called at 2019-01-27 
03:37:49.652707 (xid=0x49e54c0)
[2019-01-27 03:50:56.966845] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour
ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste]
[2019-01-27 03:50:56.967086] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 03:50:56.967224] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 03:50:56.992129] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour
ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste]
[2019-01-27 03:50:56.995954] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour
ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste]
[2019-01-27 03:50:56.996126] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 03:50:56.996751] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 03:50:57.001301] E [MSGID: 114031] [client-rpc-fops.c:435:client3_3_open_cbk] 0-tier2-client-57: remote operation failed. Path: /CSP/sp1/SPS3/sps_201205/.rh2m_6hour
ly_sps_201205_017.2012-06.nc.gz.DHNucn (5d0575fb-21c3-45b0-97bf-46d13f3c9262) [Il file esiste]
[2019-01-27 03:50:57.006375] E [MSGID: 122034] [ec-common.c:613:ec_child_select] 0-tier2-disperse-9: Insufficient available children for this request (have 0, need 4)
[2019-01-27 04:02:09.269929] E [socket.c:2376:socket_connect_finish] 0-tier2-client-57: connection to 192.168.0.54:49159 failed (Timeout della connessione); disconnecting socke
t

I focused my attention on the errors occurred on gluster client during 25th of January 2019.
So, I collected the following log files:

- tier2.old.log.gz, (the client log file covering the period)
- tier2.today.log.gz (the updated client log file reporting similar error related to “renaming” and “timeout” issues)
- memory_usage.pdf, (weekly statistics for memory usage on 192.168.0.54 host from NAGIOS)
- network_usage.pdf, (weekly statistics for network usage on 192.168.0.54 host from NAGIOS)
- cpu_usage.pdf, (weekly statistics for network usage on 192.168.0.54 host from NAGIOS)
- gluster-mnt6-brick.log-20190127, brick 6 (on host 192.168.0.54) log file
- gluster-mnt7-brick.log-20190127, brick 7 (on host 192.168.0.54) log file

I detected the involved bricks greping on “gluster volume status” command output:

[root at s04 bricks]# gluster volume status|grep 49158
Brick s02-stg:/gluster/mnt6/brick           49158     0          Y       4147 
Brick s03-stg:/gluster/mnt6/brick           49158     0          Y       4272 
Brick s01-stg:/gluster/mnt7/brick           49158     0          Y       3324 
Brick s04-stg:/gluster/mnt7/brick           49158     0          Y       3787 
Brick s05-stg:/gluster/mnt7/brick           49158     0          Y       3131 
Brick s06-stg:/gluster/mnt7/brick           49158     0          Y       3254 

On gluster server s04 (192.168.0.54), in messages log file, I can see a lot of errors like the following one and nothig more

Jan 25 14:16:22 s04 kernel: traps: check_vol_utili[142140] general protection ip:7f8fcec7866d sp:7ffee3a86a30 error:0 in libglusterfs.so.0.0.1[7f8fcec25000+f7000]
Jan 25 14:16:22 s04 abrt-hook-ccpp: Process 142140 (python2.7) of user 0 killed by SIGSEGV - dumping core
Jan 25 14:16:22 s04 abrt-server: Generating core_backtrace
Jan 25 14:16:22 s04 abrt-server: Error: Unable to open './coredump': No such file or directory
Jan 25 14:17:00 s04 abrt-server: Duplicate: UUID
Jan 25 14:17:00 s04 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-11:18:20-4471
Jan 25 14:17:00 s04 abrt-server: Deleting problem directory ccpp-2019-01-25-14:16:22-142140 (dup of ccpp-2018-09-25-11:18:20-4471)
Jan 25 14:17:00 s04 dbus[1877]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jan 25 14:17:00 s04 dbus[1877]: [system] Successfully activated service 'org.freedesktop.problems'
Jan 25 14:17:00 s04 abrt-server: Generating core_backtrace
Jan 25 14:17:00 s04 abrt-server: Error: Unable to open './coredump': No such file or directory
Jan 25 14:17:00 s04 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-11:18:20-4471' via uReport: Event 'report_uReport' exited with 1
Jan 25 14:17:01 s04 systemd: Created slice User Slice of root.
Jan 25 14:17:01 s04 systemd: Starting User Slice of root.
Jan 25 14:17:01 s04 systemd: Started Session 140059 of user root.
Jan 25 14:17:01 s04 systemd: Starting Session 140059 of user root.
Jan 25 14:17:01 s04 systemd: Removed slice User Slice of root.
Jan 25 14:17:01 s04 systemd: Stopping User Slice of root.
Jan 25 14:18:01 s04 systemd: Created slice User Slice of root.
Jan 25 14:18:01 s04 systemd: Starting User Slice of root.
Jan 25 14:18:01 s04 systemd: Started Session 140060 of user root.
Jan 25 14:18:01 s04 systemd: Starting Session 140060 of user root.
Jan 25 14:18:01 s04 systemd: Removed slice User Slice of root.
Jan 25 14:18:01 s04 systemd: Stopping User Slice of root.
Jan 25 14:18:35 s04 kernel: traps: check_vol_utili[142501] general protection ip:7f119fd7066d sp:7ffc399aa6f0 error:0 in libglusterfs.so.0.0.1[7f119fd1d000+f7000]
Jan 25 14:18:35 s04 abrt-hook-ccpp: Process 142501 (python2.7) of user 0 killed by SIGSEGV - dumping core
Jan 25 14:18:35 s04 abrt-server: Generating core_backtrace
Jan 25 14:18:35 s04 abrt-server: Error: Unable to open './coredump': No such file or directory

No errors in /var/log/messages log file related to the client machine.

Anyone of us could help me to solve this issue?
Thank you very much in advance,
Mauro



















-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0008.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tier2.old.log.gz
Type: application/x-gzip
Size: 636711 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0004.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0009.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tier2.today.log.gz
Type: application/x-gzip
Size: 325016 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0005.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0010.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cpu_usage.pdf
Type: application/pdf
Size: 1011067 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0003.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0011.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memory_usage.pdf
Type: application/pdf
Size: 65668 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0004.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0012.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: network_usage.pdf
Type: application/pdf
Size: 67425 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0005.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0013.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster-mnt6-brick.log-20190127.gz
Type: application/x-gzip
Size: 497687 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0006.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0014.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster-mnt7-brick.log-20190127.gz
Type: application/x-gzip
Size: 526789 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0007.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/056cf268/attachment-0015.html>


More information about the Gluster-users mailing list