[Gluster-users] False notifications

Milos Kozak milos.kozak at lejmr.com
Thu May 22 18:17:44 UTC 2014


Hi,


On 5/14/2014 1:45 AM, Joe Julian wrote:
>
> On 5/13/2014 10:43 PM, Sahina Bose wrote:
>>
>> On 05/14/2014 07:42 AM, Miloš Kozák wrote:
>>> Hi,
>>> I am running a field trial of Gluster 3.5 on two servers. These two
>>> server use one 10k HDD each with XFS as a brick. On top of these
>>> bricks I have one replica 2 volume:
>>>
>>> [root at nodef01i ~]# gluster volume info ph-fs-0
>>>
>>> Volume Name: ph-fs-0
>>> Type: Replicate
>>> Volume ID: 5085e018-7c47-4d4f-8dcb-cd89ec240393
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 10.11.100.1:/gfs/s3-sata-10k/brick
>>> Brick2: 10.11.100.2:/gfs/s3-sata-10k/brick
>>> Options Reconfigured:
>>> performance.io-thread-count: 12
>>> network.ping-timeout: 2
>>> performance.cache-max-file-size: 0
>>> performance.flush-behind: on
>>>
>>> Additionally I am running nagios to monitor everything where I use
>>> http://exchange.nagios.org/directory/Plugins/System-Metrics/File-System/GlusterFS-checks/details.
>>> I improved it slightly such that I monitor number of split-brain
>>> files and all this information go to the performance data, therefore
>>> I can draw pictures out of it (these pictures are in attachement).
>>>
>>> My problem is that I am receiving quite a lot of false warning from
>>> nagios during a day because there are some unsync files (gluster
>>> volume heal XXX info). I dont know if it is a bug or it is cause by
>>> my configuration. Either way it is quite disturbing and I am afraid
>>> that after receiving a lot false warning I could just omit an
>>> important one..
>>
>>
>> I think the issue is because the "gluster volume heal info" also
>> reports files undergoing I/O in addition to files that need self-heal.
>> see
>> http://supercolony.gluster.org/pipermail/gluster-users/2014-May/040239.html
>> for more information on this. Pranith, please correct me if wrong.
>>
>
> That's what I've seen as well.
>
>> On another note, we are also developing Nagios plugins that can be
>> used to monitor the various entities and services in the gluster
>> cluster. The repositories are here -
>>
>> gluster-nagios-addons -
>> http://review.gluster.org/#/admin/projects/gluster-nagios-addons
>> nagios-server-addons -
>> http://review.gluster.org/#/admin/projects/nagios-server-addons
>>
>> We will be putting together a short doc on these soon, meanwhile,
>> please feel free to check it out and give us your valuable feedback.
>>

I walked your source codes through and I realized, according to my 
knowleadge for sure, that this is not real glusterfs addon it is 3rd 
party monitoring "daemon" or collection of scripts that monitor and 
inform nagios.. But you have the same problem with self-healing..

Basically this can be resolved only when Pranith fixes the output. In 
the meanwhile I am planning to write log parser, even if it is not 
greatest solution.. cause I need it.


>>
>>
>>>
>>> network.ping-timeout is set to 2, because I can not allow VM servers
>>> to hang for 2x42sec when other node is rebooted (we have some kind of
>>> reboot policy)..
>>>
>>> Thanks for help,
>>> Milos
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>



More information about the Gluster-users mailing list