[Gluster-users] Gluster 3.4.2 on Redhat 6.5

Mon Mar 24 22:26:52 UTC 2014

Hi Steve,

you scared me there a bit because I've just put to prod RHEL6.5 +
GlusterFS 3.4.2

However, I cannot see any such problem. I have no zombie processes and
executing the command in question, or any other, does not create zombies
or other problems.

Unfortunatelly I'm not sure what could be causing this..

v

On Mon 24 Mar 2014 12:13:03, Steve Thomas wrote:
> Some further information:
> 
> When I run the command
> "gluster volume status audio detail"
> I get the Zombie process created.... So it's not the HERE document as I previously thought... it's the command itself.
> 
> Does this happen with anyone else?
> 
> Thanks,
> Steve
> 
> 
> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Steve Thomas
> Sent: 24 March 2014 11:55
> To: Carlos Capriotti
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Gluster 3.4.2 on Redhat 6.5
> 
> Hi Carlos,
> 
> Thanks for coming back to me... in response to your queries:
> 
> PID is low, 1153 for glusterd with glusterfsd 1168 and 2 x glusterfs with 1318 and 1319 so I'd agree... it doesn't seem that glusterd is crashing and being restarted.
> 
> As of today, Monday morning top is reporting 1398 glusterd zombie processes.
> 
> I have this problem on all 4 of my gluster nodes and all four are being monitored by the attached nagios plugin.
> 
> In terms of testing, I've prevented nagios from running the attached check script and restarted the glusterd process using
> "service glusterd restart". I've let it run for a few hours and haven't yet seen any zombie processes created. This I think is good as, for whatever reason, it appears to point at the nagios check script being the problem.
> 
> My next check was to run the nagios check once to see if it created a Zombie process... it did.... So I started looking at the script. I forced the script to exit after the first command "gluster volume heal audio info" and no Zombie process was created. This pointed me to the second which takes this form.... I'm no expert of HERE documents in shell but I think that it maybe causing the issue:
> while read -r line; do
>      field=($(echo $line))
>      case ${field[0]} in
>      Brick)
>            brick=${field[@]:2}
>            ;;
>      Disk)
>            key=${field[@]:0:3}
>            if [ "${key}" = "Disk Space Free" ]; then
>                 freeunit=${field[@]:4}
>                 unit=${freeunit: -2}
>                 free=${freeunit%$unit}
>                 if [ "$unit" != "GB" ]; then
>                      Exit UNKNOWN "Unknown disk space size $freeunit\n"
>                 fi
>                 if (( $(bc <<< "${free} < ${freegb}") == 1 )); then
>                      freegb=$free
>                 fi
>            fi
>            ;;
>      Online)
>            online=${field[@]:2}
>            if [ "${online}" = "Y" ]; then
>                 let $((bricksfound++))
>            else
>                 errors=("${errors[@]}" "$brick offline")
>            fi
>            ;;
>      esac
> done < <( sudo gluster volume status ${VOLUME} detail)
> 
> 
> Anyone spot why this would be an issue?
> 
> Thanks,
> Steve
> 
> 
> From: Carlos Capriotti [mailto:capriotti.carlos at gmail.com]
> Sent: 22 March 2014 11:51
> To: Steve Thomas
> Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
> Subject: Re: [Gluster-users] Gluster 3.4.2 on Redhat 6.5
> 
> ok, let's see if we can gather more info.
> 
> I am not a specialist, but you know... another pair of eyes.
> 
> My system has a single glusterd process and it has a pretty low PID, meaning it has not crashed.
> 
> What is your PID for your glusterd ? how many zombie processes are there reported by top ?
> 
> I've been running my preliminary tests with gluster for a little over a month now and have never seen this. My platform is CentOS 6.5, so, I'd say it is pretty similar.
> 
> >From my perspective, even making gluster sweat, running some intense rsync jobs in parallel, and seeing glusterd AND glusterfs take 120% of processing time on top (each on one core), they never crashed.
> 
> My zombie count, from top,  is zero.
> 
> On the other hand, I had one of my nodes, the other day, crashing a process every time I started a high demanding task. Ends up I had (and still have) a hardware problem on one of the processor (or the main board; still undiagnosed).
> 
> Do you have this problem on one node only ?
> 
> Any chance you have something special compiled on your kernel ?
> 
> Any particularly memory-hungry tweak on your sysctl ?
> 
> Sounds like the system, not gluster.
> 
> KR,
> 
> Carlos
> 
> 
> 
> On Fri, Mar 21, 2014 at 10:29 PM, Steve Thomas <sthomas at rpstechnologysolutions.co.uk<mailto:sthomas at rpstechnologysolutions.co.uk>> wrote:
> Hi all...
> 
> Further investigation shows in excess of 500 glusterd zombie processes and continuing to climb on the box ...
> 
> Any suggestions? Am happy to provide logs etc to get to the bottom of this....
> 
> _____________________________________________
> From: Steve Thomas
> Sent: 21 March 2014 13:21
> To: 'gluster-users at gluster.org<mailto:gluster-users at gluster.org>'
> Subject: Gluster 3.4.2 on Redhat 6.5
> 
> 
> Hi,
> 
> I'm running Gluster 3.4.2 on Redhat 6.5 with 4 servers with a brick on each. This brick is mounted locally and used by apache to server audio files for an IVR system. Each of these audio files are typically around 80-100Kb.
> 
> System appears to be working ok in terms of health and status via gluster CLI.
> 
> The system is monitored by nagios and there's a check for zombie processes and the gluster status. It appears that over a 24 hour period the number of Zombie processes on the box has increased and is continually increasing. Investigating these are "glusterd" processes.
> 
> I'm making an assumption but I'd suspect that the regular nagios checks are resulting in the increase in zombie processes as they are querying the glusterd process. The command that the nagios plugin is running is:
> 
> #Check heal status
> gluster volume heal audio info
> 
> #Check volume status
> gluster volume status audio detail
> 
> Does anyone have any suggestions as to why glusterd is resulting in these zombie processes?
> 
> Thanks for help in advance,
> 
> Steve
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 

> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 
Regards

Viktor Villafuerte
Optus Internet Engineering
t: 02 808-25265