[Gluster-users] single problematic node (brick)

Wed May 21 03:37:46 UTC 2014

Are you running out of memory? How much memory are the gluster daemons
using?

On Tue, 2014-05-20 at 11:16 -0700, Doug Schouten wrote: 
> Hello,
> 
> 	I have a rather simple Gluster configuration that consists of 85TB 
> distributed across six nodes. There is one particular node that seems to 
> fail on a ~ weekly basis, and I can't figure out why.
> 
> I have attached my Gluster configuration and a recent log file from the 
> problematic node. For a user, when the failure occurs, the symptom is 
> that any attempts to access the Gluster volume from the problematic node 
> fails with "transport endpoint not connected" error.
> 
> Restarting the Gluster daemons and remounting the volume on the failed 
> node always fixes the problem. But usually by that point some number of 
> jobs in our batch queue have failed b/c of this issue already, and it's 
> becoming a headache.
> 
> It could be a fuse issue, since I see many related error messages in the 
> Gluster log, but I can't disentangle the various errors. The relevant 
> line in my /etc/fstab file is
> 
> server:global /global glusterfs 
> defaults,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log 
> 0 0
> 
> Any ideas on the source of the problem? Could it be a hardware (network) 
> glitch? The fact that it only happens on one node that is identically 
> configured (with same hardware) as other nodes points to something like 
> that.
> 
> thanks! Doug
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users