[Gluster-users] Help, peer probe seems to get stuck on large cluster.

Atin Mukherjee amukherj at redhat.com
Mon Aug 31 08:54:11 UTC 2015



On 08/31/2015 01:10 PM, Yiping Peng wrote:
> Hi guys,
> 
> 
> I've been running GlusterFS for a couple of days and it's been nice and
> steady, except a minor problem: the peer probing on my relatively large
> cluster seems to stuck for a long time.
> 
> 
> Last time atinm told me in IRC (I was barius.2333 in IRC) that a cluster as
> large as 50+ nodes might take a long time peer probing (o(n^2) time), and
> now my cluster has expanded to 90+ nodes.
> 
> 
> The peer probing process was started 4 days ago, when my cluster had ~50
> nodes. I probed ~40 nodes using subprocess in bash at once, and the
> commands all successfully returned almost immediately (no time-outs).
> 
> 
> However the glusterd kept writing to /var/lib/glusterd/peers/ during the
> last 4 days, and all commands related to newly-added nodes, e.g. add-brick,
> mount, will time-out and fail. Also, running “gluster peer status” on my
> nodes shows “Disconnected” nodes that varies over time.
Peer status should not shows node in disconnected state even if the peer
handshaking takes longer time, if it does then something is wrong. Could
you check which node is disconnected and what the glusterd log file on
that node indicates?
> 
> 
> What shall I do in such situation? Do I need to wait for the whole peer
> probing progress to complete, or can I simply kill the glusterd and restart
> it?
> 
> 
> Regards,
> 
> Yiping Peng
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 


More information about the Gluster-users mailing list