[Gluster-users] Help, peer probe seems to get stuck on large cluster.

Yiping Peng barius.cn at gmail.com
Mon Aug 31 07:40:07 UTC 2015


Hi guys,


I've been running GlusterFS for a couple of days and it's been nice and
steady, except a minor problem: the peer probing on my relatively large
cluster seems to stuck for a long time.


Last time atinm told me in IRC (I was barius.2333 in IRC) that a cluster as
large as 50+ nodes might take a long time peer probing (o(n^2) time), and
now my cluster has expanded to 90+ nodes.


The peer probing process was started 4 days ago, when my cluster had ~50
nodes. I probed ~40 nodes using subprocess in bash at once, and the
commands all successfully returned almost immediately (no time-outs).


However the glusterd kept writing to /var/lib/glusterd/peers/ during the
last 4 days, and all commands related to newly-added nodes, e.g. add-brick,
mount, will time-out and fail. Also, running “gluster peer status” on my
nodes shows “Disconnected” nodes that varies over time.


What shall I do in such situation? Do I need to wait for the whole peer
probing progress to complete, or can I simply kill the glusterd and restart
it?


Regards,

Yiping Peng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150831/a77834b2/attachment.html>


More information about the Gluster-users mailing list