[Gluster-devel] Slow volume, gluster volume status bug
amukherj at redhat.com
Tue Nov 14 13:08:44 UTC 2017
On Tue, Nov 14, 2017 at 2:47 PM, Emmanuel Dreyfus <manu at netbsd.org> wrote:
> On Tue, Nov 14, 2017 at 12:17:05PM +0530, Atin Mukherjee wrote:
> > > gluster volume status also exhibits trouble: each server will only
> > > list its bricks, but not the other's one. I suspect it could just
> > > be some tiemout because of slow answer from the peer.
> > Have you checked the output of gluster peer status? Also does glusterd
> > file give any hint on time outs, rpc failures, disconnections et all?
> gluster peer status says "State: Sent and Received peer request
on both sides.
So this is the origin of why the peers don't understand they are connected.
Friend handshaking got stuck in the middle and it never recovered back.
Restarting the glusterd services ideally should fix the state, if not then
you'd have to manually edit the /var/lib/glusterd/peers/UUID files with
state=3 and then restart glusterd service.
> I have this in glusterd.log:
> [2017-11-14 08:49:47.289423] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind]
> 0-pmap: adding brick /export/wd3e on port 49155
> [2017-11-14 08:49:52.289926] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind]
> 0-pmap: adding brick /export/wd0e on port 49152
> [2017-11-14 08:49:52.295394] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind]
> 0-pmap: adding brick /export/wd1e on port 49153
> [2017-11-14 08:49:52.302973] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind]
> 0-pmap: adding brick /export/wd2e on port 49154
> [2017-11-14 08:54:31.535066] W [socket.c:593:__socket_rwv] 0-management:
> readv on 192.0.2.109:24007 failed (Connection reset by peer)
> [2017-11-14 08:54:32.567745] I [MSGID: 106004] [glusterd-handler.c:6284:__glusterd_peer_rpc_notify]
> 0-management: Peer <bidon> (<2d7719d9-0466-434c-a881-4081156fac47>), in
> state <Probe Sent to Peer>, has disconnected from glusterd.
> An odd thing: the registrations message suggest the local bricks should
> show as online in glusterfs volume status output. They are displayed as
> offline, until I kill the glusterfsd processes and issue a
> gluster volume start gfs force.
> ALong with symetrical stuff, the peer has this;
> [2017-11-14 08:56:05.799686] E [socket.c:2369:socket_connect_finish]
> 0-management: connection to 192.0.2.110:24007 failed (Connection timed
> out); disconnecting socket
> In the meantime I tracked the performance problem to exteded atributes
> system calls. The root of the problem is outside of glusterfs, but fixing
> the consequuences would be nice.
> Emmanuel Dreyfus
> manu at netbsd.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-devel