[Gluster-users] Replication not working on server hang

David Saez Padros david at ols.es
Tue Sep 1 09:38:11 UTC 2009


Hi

>> c) does not glsuterfs ping the servers periodically to see if they
>> are available or not ? if so, why does not it detect that situation ?
> 
> It does, but in this case the server is up and running and replying
> with pongs. The current ping-pong only checks for network reachability
> to the server process.

not sure that the server is replying to pings in this situation ...

Anyway I was trying to check how glusterfs behaves when no server is
available so i have setup a replicated volume identical to the one i'm
using but having all the remote-host option point to ip addresses not
used in our network. I mounted it and tried to do a ls on the mount
point. The client hanged the same way (forever), i have killed the
glusterfs process 25 minutes after (past all configurable timeouts).
Altough this can be useful in some situations (i.e when both server
and clients are rebooting so clients will wait until some server is
available) it also can be bad as applications will never notice that
something is going wrong

Given volfile:
+------------------------------------------------------------------------------+
   1: volume data1
   2:   type protocol/client
   3:   option transport-type tcp
   4:   option remote-host 192.168.1.99
   5:   option remote-subvolume export
   6:   option ping-timeout 5
   7: end-volume
   8:
   9: volume data2
  10:   type protocol/client
  11:   option transport-type tcp
  12:   option remote-host 192.168.2.99
  13:   option remote-subvolume export
  14:   option ping-timeout 5
  15: end-volume
  16:
  17: volume data
  18:   type cluster/replicate
  19:   subvolumes data1 data2
  20: end-volume

+------------------------------------------------------------------------------+[2009-09-01 
11:05:44] N [glusterfsd.c:1152:main] glusterfs: Successfully started
[2009-09-01 11:05:47] E [socket.c:744:socket_connect_finish] data1: 
connection to  failed (No route to host)
[2009-09-01 11:05:47] E [socket.c:744:socket_connect_finish] data1: 
connection to  failed (No route to host)
[2009-09-01 11:05:47] E [socket.c:744:socket_connect_finish] data2: 
connection to  failed (No route to host)
[2009-09-01 11:05:47] E [socket.c:744:socket_connect_finish] data2: 
connection to  failed (No route to host)
[2009-09-01 11:31:30] W [glusterfsd.c:827:cleanup_and_exit] glusterfs: 
shutting down

Please also note the "connection to  failed" which is a) duplicated and
b) does not say where it has tried to connect

-- 
Best regards ...

----------------------------------------------------------------
    David Saez Padros                http://www.ols.es
    On-Line Services 2000 S.L.       telf    +34 902 50 29 75
----------------------------------------------------------------





More information about the Gluster-users mailing list