[Gluster-users] Self healing of 3.3.0 cause our 2 bricks replicated cluster freeze (client read/write timeout)

Thu Nov 29 13:09:21 UTC 2012

On 11/29/2012 05:58 AM, Jeff Darcy wrote:
> On 11/26/12 4:46 AM, ZHANG Cheng wrote:
>> Early this morning our 2 bricks replicated cluster had an outage. The
>> disk space for one of the brick server (brick02) was used up. When we
>> responded to the disk full alert, the issue already lasted for a few
>> hours. We reclaimed some disk space, and reboot the brick02 server,
>> expecting once it come back it will go self healing.
>>
>> It did go self healing, but just after couple minutes, access to
>> gluster filesystem freeze. Tons of "nfs: server brick not responding,
>> still trying" popped up in dmesg. The load average on app server went
>> up to 200 something from usual 0.10. We had to shutdown brick02 server
>> or stop gluster server process on it, to get the gluster cluster back
>> working.
> 
> Have you checked the glustershd logs (should be in /var/log/glusterfs)
> on the bricks?  If there's nothing useful there, a statedump would also
> be useful.  See the "gluster volume statedump" instructions on your
> friendly local admin guide (section 10.4 for GlusterFS 3.3).  Most
> helpful of all would be a bug report with any of this information plus a
> description of your configuration.  You can either create a new one or
> attach the info to an existing bug if one seems to fit.  The following
> seems like it might be related, even though it's for virtual machines.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=881685
> _______________________________________________

FWIW, one likely way to determine whether you are affected by this bug
is to limit self-heal to glustershd ('gluster volume set <vol>
cluster.data-self-heal off'). It seems unlikely, but couldn't hurt to
try I suppose.

Either way, unless it's very obvious this is the same issue I think it
would be preferable to file a separate bug. It's easier to mark it a
duplicate later than to make sense of disjoint threads diagnosing
separate issues or configurations in a single bug. :)

Brian

> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>