[Gluster-users] Node down and volumes unreachable

Vijay Bellur vbellur at redhat.com
Tue Feb 18 06:56:12 UTC 2014


On 02/17/2014 11:19 PM, Marco Zanger wrote:
> Read/write operations hang for long period of time (too long). I've seen it in that state (waiting) for something like 5 minutes, which makes every application fail trying to read or write. These are the Errors I found in the logs in the server A which is still accessible (B was down)
>
> etc-glusterfs-glusterd.vol.log
>
> ...
>   [2014-01-31 07:56:49.780247] W [socket.c:1512:__socket_proto_state_machine] 0-management: reading from socket failed. Error (Connection timed out), peer (<SERVER_B_IP>:24007)
> [2014-01-31 07:58:25.965783] E [socket.c:1715:socket_connect_finish] 0-management: connection to <SERVER_B_IP>:24007 failed (No route to host)
> [2014-01-31 08:59:33.923250] I [glusterd-handshake.c:397:glusterd_set_clnt_mgmt_program] 0-: Using Program glusterd mgmt, Num (1238433), Version (2)
> [2014-01-31 08:59:33.923289] I [glusterd-handshake.c:403:glusterd_set_clnt_mgmt_program] 0-: Using Program Peer mgmt, Num (1238437), Version (2)
> ...
>
>
> glustershd.log
>
> [2014-01-27 12:07:03.644849] W [socket.c:1512:__socket_proto_state_machine] 0-teoswitch_custom_music-client-1: reading from socket failed. Error (Connection timed out), peer (<SERVER_B_IP>:24010)
> [2014-01-27 12:07:03.644888] I [client.c:2090:client_rpc_notify] 0-teoswitch_custom_music-client-1: disconnected
> [2014-01-27 12:09:35.553628] E [socket.c:1715:socket_connect_finish] 0-teoswitch_greetings-client-1: connection to <SERVER_B_IP>:24011 failed (Connection timed out)
> [2014-01-27 12:10:13.588148] E [socket.c:1715:socket_connect_finish] 0-license_path-client-1: connection to <SERVER_B_IP>:24013 failed (Connection timed out)
> [2014-01-27 12:10:15.593699] E [socket.c:1715:socket_connect_finish] 0-upload_path-client-1: connection to <SERVER_B_IP>:24009 failed (Connection timed out)
> [2014-01-27 12:10:21.601670] E [socket.c:1715:socket_connect_finish] 0-teoswitch_ivr_greetings-client-1: connection to <SERVER_B_IP>:24012 failed (Connection timed out)
> [2014-01-27 12:10:23.607312] E [socket.c:1715:socket_connect_finish] 0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 failed (Connection timed out)
> [2014-01-27 12:11:21.866604] E [afr-self-heald.c:418:_crawl_proceed] 0-teoswitch_ivr_greetings-replicate-0: Stopping crawl as < 2 children are up
> [2014-01-27 12:11:21.867874] E [afr-self-heald.c:418:_crawl_proceed] 0-teoswitch_greetings-replicate-0: Stopping crawl as < 2 children are up
> [2014-01-27 12:11:21.868134] E [afr-self-heald.c:418:_crawl_proceed] 0-teoswitch_custom_music-replicate-0: Stopping crawl as < 2 children are up
> [2014-01-27 12:11:21.869417] E [afr-self-heald.c:418:_crawl_proceed] 0-license_path-replicate-0: Stopping crawl as < 2 children are up
> [2014-01-27 12:11:21.869659] E [afr-self-heald.c:418:_crawl_proceed] 0-upload_path-replicate-0: Stopping crawl as < 2 children are up
> [2014-01-27 12:12:53.948154] I [client-handshake.c:1636:select_server_supported_programs] 0-teoswitch_greetings-client-1: Using Program GlusterFS 3.3.0, Num (1298437), Version (330)
> [2014-01-27 12:12:53.952894] I [client-handshake.c:1433:client_setvolume_cbk] 0-teoswitch_greetings-client-1: Connected to <SERVER_B_IP>:24011, attached to remote volume
>
> nfs.log  there are lots of errors but the one that insist most Is this:
>
> [2014-01-27 12:12:27.136033] E [socket.c:1715:socket_connect_finish] 0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 failed (Connection timed out)
>
> Any ideas? From the logs I see nothing but confirm the fact that A cannot reach B which makes sense since B is down. But A is not, and it's volume should still be accesible. Right?

Nothing very obvious from these logs.

Can you share relevant portions of the client log file? Usually the name 
of the mount point would be a part of the client log file.

-Vijay




More information about the Gluster-users mailing list