[Gluster-users] Mount Fails when 1 of 2 Replicas is Down (GlusterFS 3.7.2)

Thu Jul 2 06:20:06 UTC 2015

10.1.0.100 is the IP of the replica server that is down. However this log
is from the replica server that is up, there's only 2 servers and they are
both replicas for the volume. It shows up when attempting to mount the
volume from a client, it seems the server that's up is trying to contact
the server that's down and things are failing?

I also noticed in the glusterd log the following continuous errors when the
other node is down, is this normal?

[2015-07-02 06:16:18.028223] W
[glusterd-locks.c:653:glusterd_mgmt_v3_unlock] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f1d94a9bd59] (-->
/usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x47a)[0x7f1d8fa30efa]
(-->
/usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a2)[0x7f1d8f9abda2]
(-->
/usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f1d8f9a3700]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a8)[0x7f1d9486c458] )))))
0-management: Lock for vol test not held

On Wed, Jul 1, 2015 at 5:03 PM, Vijay Bellur <vbellur at redhat.com> wrote:

> On Tuesday 30 June 2015 10:56 PM, Gabriel Kuri wrote:
>
>> I am able to reproduce a problem, which I think may be a bug, where if 1
>> of the 2 replica servers for a volume is down, clients are unable to
>> mount the volume. I notice that if the replica that is down is on the
>> same subnet as the client, the client fails to mount the volume, but if
>> the replica that is down is on a different subnet, the client fails over
>> properly and mounts the volume.
>>
>> Here are the errors from the server that is still up when the client is
>> unable to mount the volume when the replica on the same subnet as the
>> client is down. Ideas? Should I open a bug?
>>
>> [2015-07-01 05:43:08.428657] W [socket.c:923:__socket_keepalive]
>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 21, Invalid
>> argument
>> [2015-07-01 05:43:08.428710] E [socket.c:3015:socket_connect]
>> 0-management: Failed to set keep-alive: Invalid argument
>> [2015-07-01 05:43:08.429260] E [socket.c:3071:socket_connect]
>> 0-management: connection attempt on 10.1.0.100:24007
>> <http://10.1.0.100:24007> failed, (Connection refused)
>>
>
>
> This points to the client not being able to talk to glusterd on
> 10.1.0.100. Is glusterd running on this node and if yes, can port 24007 be
> reached from the client machine?
>
> Regards,
> Vijay
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150701/45d27705/attachment.html>