[Gluster-users] single problematic node (brick)

Wed May 21 15:35:29 UTC 2014

Hi,

glusterfs is using ~ 5% of memory (24GB total) and glusterfsd is using < 1%

The I/O cache size (performance.cache-size) is 1GB.

cheers, Doug

On 20/05/14 08:37 PM, Franco Broi wrote:
> Are you running out of memory? How much memory are the gluster daemons
> using?
>
> On Tue, 2014-05-20 at 11:16 -0700, Doug Schouten wrote:
>> Hello,
>>
>> 	I have a rather simple Gluster configuration that consists of 85TB
>> distributed across six nodes. There is one particular node that seems to
>> fail on a ~ weekly basis, and I can't figure out why.
>>
>> I have attached my Gluster configuration and a recent log file from the
>> problematic node. For a user, when the failure occurs, the symptom is
>> that any attempts to access the Gluster volume from the problematic node
>> fails with "transport endpoint not connected" error.
>>
>> Restarting the Gluster daemons and remounting the volume on the failed
>> node always fixes the problem. But usually by that point some number of
>> jobs in our batch queue have failed b/c of this issue already, and it's
>> becoming a headache.
>>
>> It could be a fuse issue, since I see many related error messages in the
>> Gluster log, but I can't disentangle the various errors. The relevant
>> line in my /etc/fstab file is
>>
>> server:global /global glusterfs
>> defaults,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log
>> 0 0
>>
>> Any ideas on the source of the problem? Could it be a hardware (network)
>> glitch? The fact that it only happens on one node that is identically
>> configured (with same hardware) as other nodes points to something like
>> that.
>>
>> thanks! Doug
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>