[Gluster-users] Self healing of 3.3.0 cause our 2 bricks replicated cluster freeze (client read/write timeout)

Jeff Darcy jdarcy at redhat.com
Thu Nov 29 10:58:01 UTC 2012

On 11/26/12 4:46 AM, ZHANG Cheng wrote:
> Early this morning our 2 bricks replicated cluster had an outage. The
> disk space for one of the brick server (brick02) was used up. When we
> responded to the disk full alert, the issue already lasted for a few
> hours. We reclaimed some disk space, and reboot the brick02 server,
> expecting once it come back it will go self healing.
> It did go self healing, but just after couple minutes, access to
> gluster filesystem freeze. Tons of "nfs: server brick not responding,
> still trying" popped up in dmesg. The load average on app server went
> up to 200 something from usual 0.10. We had to shutdown brick02 server
> or stop gluster server process on it, to get the gluster cluster back
> working.

Have you checked the glustershd logs (should be in /var/log/glusterfs)
on the bricks?  If there's nothing useful there, a statedump would also
be useful.  See the "gluster volume statedump" instructions on your
friendly local admin guide (section 10.4 for GlusterFS 3.3).  Most
helpful of all would be a bug report with any of this information plus a
description of your configuration.  You can either create a new one or
attach the info to an existing bug if one seems to fit.  The following
seems like it might be related, even though it's for virtual machines.


More information about the Gluster-users mailing list