[Gluster-users] glusterfsd process spinning

Susant Palai spalai at redhat.com
Wed Jun 4 05:51:39 UTC 2014


>From the logs it seems files are present on data(21,22,23,24) which are on nas6 while missing on data(17,18,19,20) which are on nas5 (interesting). There is an existing issue where directories does not show up on mount point if they are not present on first_up_subvol(longest living brick) and the current issue looks more similar. Well will look at the client logs for more information.

Susant.

----- Original Message -----
From: "Franco Broi" <franco.broi at iongeo.com>
To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
Cc: "Susant Palai" <spalai at redhat.com>, gluster-users at gluster.org, "Raghavendra Gowdappa" <rgowdapp at redhat.com>, kdhananj at redhat.com, vsomyaju at redhat.com, nbalacha at redhat.com
Sent: Wednesday, 4 June, 2014 10:32:37 AM
Subject: Re: [Gluster-users] glusterfsd process spinning

On Wed, 2014-06-04 at 10:19 +0530, Pranith Kumar Karampuri wrote: 
> On 06/04/2014 08:07 AM, Susant Palai wrote:
> > Pranith can you send the client and bricks logs.
> I have the logs. But I believe for this issue of directory not listing 
> entries, it would help more if we have the contents of that directory on 
> all the directories in the bricks + their hash values in the xattrs.

Strange thing is, all the invisible files are on the one server (nas6),
the other seems ok. I did rm -Rf of /data2/franco/dir* and was left with
this one directory - there were many hundreds which were removed
successfully.

I've attached listings and xattr dumps.

Cheers,

Volume Name: data2
Type: Distribute
Volume ID: d958423f-bd25-49f1-81f8-f12e4edc6823
Status: Started
Number of Bricks: 8
Transport-type: tcp
Bricks:
Brick1: nas5-10g:/data17/gvol
Brick2: nas5-10g:/data18/gvol
Brick3: nas5-10g:/data19/gvol
Brick4: nas5-10g:/data20/gvol
Brick5: nas6-10g:/data21/gvol
Brick6: nas6-10g:/data22/gvol
Brick7: nas6-10g:/data23/gvol
Brick8: nas6-10g:/data24/gvol
Options Reconfigured:
nfs.drc: on
cluster.min-free-disk: 5%
network.frame-timeout: 10800
nfs.export-volumes: on
nfs.disable: on
cluster.readdir-optimize: on

Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick nas5-10g:/data17/gvol				49152	Y	6553
Brick nas5-10g:/data18/gvol				49153	Y	6564
Brick nas5-10g:/data19/gvol				49154	Y	6575
Brick nas5-10g:/data20/gvol				49155	Y	6586
Brick nas6-10g:/data21/gvol				49160	Y	20608
Brick nas6-10g:/data22/gvol				49161	Y	20613
Brick nas6-10g:/data23/gvol				49162	Y	20614
Brick nas6-10g:/data24/gvol				49163	Y	20621
 
Task Status of Volume data2
------------------------------------------------------------------------------
There are no active volume tasks



> 
> Pranith
> >
> > Thanks,
> > Susant~
> >
> > ----- Original Message -----
> > From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> > To: "Franco Broi" <franco.broi at iongeo.com>
> > Cc: gluster-users at gluster.org, "Raghavendra Gowdappa" <rgowdapp at redhat.com>, spalai at redhat.com, kdhananj at redhat.com, vsomyaju at redhat.com, nbalacha at redhat.com
> > Sent: Wednesday, 4 June, 2014 7:53:41 AM
> > Subject: Re: [Gluster-users] glusterfsd process spinning
> >
> > hi Franco,
> >        CC Devs who work on DHT to comment.
> >
> > Pranith
> >
> > On 06/04/2014 07:39 AM, Franco Broi wrote:
> >> On Wed, 2014-06-04 at 07:28 +0530, Pranith Kumar Karampuri wrote:
> >>> Franco,
> >>>          Thanks for providing the logs. I just copied over the logs to my
> >>> machine. Most of the logs I see are related to "No such File or
> >>> Directory" I wonder what lead to this. Do you have any idea?
> >> No but I'm just looking at my 3.5 Gluster volume and it has a directory
> >> that looks empty but can't be deleted. When I look at the directories on
> >> the servers there are definitely files in there.
> >>
> >> [franco at charlie1 franco]$ rmdir /data2/franco/dir1226/dir25
> >> rmdir: failed to remove `/data2/franco/dir1226/dir25': Directory not empty
> >> [franco at charlie1 franco]$ ls -la  /data2/franco/dir1226/dir25
> >> total 8
> >> drwxrwxr-x 2 franco support 60 May 21 03:58 .
> >> drwxrwxr-x 3 franco support 24 Jun  4 09:37 ..
> >>
> >> [root at nas6 ~]# ls -la /data*/gvol/franco/dir1226/dir25
> >> /data21/gvol/franco/dir1226/dir25:
> >> total 2081
> >> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> >> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13020
> >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13021
> >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13022
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13024
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13027
> >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13028
> >> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13029
> >> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13031
> >> drwxrwxr-x  2 1348 200  3 May 16 12:06 dir13032
> >>
> >> /data22/gvol/franco/dir1226/dir25:
> >> total 2084
> >> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> >> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13020
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13021
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13022
> >> .....
> >>
> >> Maybe Gluster is losing track of the files??
> >>
> >>> Pranith
> >>>
> >>> On 06/02/2014 02:48 PM, Franco Broi wrote:
> >>>> Hi Pranith
> >>>>
> >>>> Here's a listing of the brick logs, looks very odd especially the size
> >>>> of the log for data10.
> >>>>
> >>>> [root at nas3 bricks]# ls -ltrh
> >>>> total 2.6G
> >>>> -rw------- 1 root root 381K May 13 12:15 data12-gvol.log-20140511
> >>>> -rw------- 1 root root 430M May 13 12:15 data11-gvol.log-20140511
> >>>> -rw------- 1 root root 328K May 13 12:15 data9-gvol.log-20140511
> >>>> -rw------- 1 root root 2.0M May 13 12:15 data10-gvol.log-20140511
> >>>> -rw------- 1 root root    0 May 18 03:43 data10-gvol.log-20140525
> >>>> -rw------- 1 root root    0 May 18 03:43 data11-gvol.log-20140525
> >>>> -rw------- 1 root root    0 May 18 03:43 data12-gvol.log-20140525
> >>>> -rw------- 1 root root    0 May 18 03:43 data9-gvol.log-20140525
> >>>> -rw------- 1 root root    0 May 25 03:19 data10-gvol.log-20140601
> >>>> -rw------- 1 root root    0 May 25 03:19 data11-gvol.log-20140601
> >>>> -rw------- 1 root root    0 May 25 03:19 data9-gvol.log-20140601
> >>>> -rw------- 1 root root  98M May 26 03:04 data12-gvol.log-20140518
> >>>> -rw------- 1 root root    0 Jun  1 03:37 data10-gvol.log
> >>>> -rw------- 1 root root    0 Jun  1 03:37 data11-gvol.log
> >>>> -rw------- 1 root root    0 Jun  1 03:37 data12-gvol.log
> >>>> -rw------- 1 root root    0 Jun  1 03:37 data9-gvol.log
> >>>> -rw------- 1 root root 1.8G Jun  2 16:35 data10-gvol.log-20140518
> >>>> -rw------- 1 root root 279M Jun  2 16:35 data9-gvol.log-20140518
> >>>> -rw------- 1 root root 328K Jun  2 16:35 data12-gvol.log-20140601
> >>>> -rw------- 1 root root 8.3M Jun  2 16:35 data11-gvol.log-20140518
> >>>>
> >>>> Too big to post everything.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> On Sun, 2014-06-01 at 22:00 -0400, Pranith Kumar Karampuri wrote:
> >>>>> ----- Original Message -----
> >>>>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> >>>>>> To: "Franco Broi" <franco.broi at iongeo.com>
> >>>>>> Cc: gluster-users at gluster.org
> >>>>>> Sent: Monday, June 2, 2014 7:01:34 AM
> >>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ----- Original Message -----
> >>>>>>> From: "Franco Broi" <franco.broi at iongeo.com>
> >>>>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> >>>>>>> Cc: gluster-users at gluster.org
> >>>>>>> Sent: Sunday, June 1, 2014 10:53:51 AM
> >>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> >>>>>>>
> >>>>>>>
> >>>>>>> The volume is almost completely idle now and the CPU for the brick
> >>>>>>> process has returned to normal. I've included the profile and I think it
> >>>>>>> shows the latency for the bad brick (data12) is unusually high, probably
> >>>>>>> indicating the filesystem is at fault after all??
> >>>>>> I am not sure if we can believe the outputs now that you say the brick
> >>>>>> returned to normal. Next time it is acting up, do the same procedure and
> >>>>>> post the result.
> >>>>> On second thought may be its not a bad idea to inspect the log files of the bricks in nas3. Could you post them.
> >>>>>
> >>>>> Pranith
> >>>>>
> >>>>>> Pranith
> >>>>>>> On Sun, 2014-06-01 at 01:01 -0400, Pranith Kumar Karampuri wrote:
> >>>>>>>> Franco,
> >>>>>>>>        Could you do the following to get more information:
> >>>>>>>>
> >>>>>>>> "gluster volume profile <volname> start"
> >>>>>>>>
> >>>>>>>> Wait for some time, this will start gathering what operations are coming
> >>>>>>>> to
> >>>>>>>> all the bricks"
> >>>>>>>> Now execute "gluster volume profile <volname> info" >
> >>>>>>>> /file/you/should/reply/to/this/mail/with
> >>>>>>>>
> >>>>>>>> Then execute:
> >>>>>>>> gluster volume profile <volname> stop
> >>>>>>>>
> >>>>>>>> Lets see if this throws any light on the problem at hand
> >>>>>>>>
> >>>>>>>> Pranith
> >>>>>>>> ----- Original Message -----
> >>>>>>>>> From: "Franco Broi" <franco.broi at iongeo.com>
> >>>>>>>>> To: gluster-users at gluster.org
> >>>>>>>>> Sent: Sunday, June 1, 2014 9:02:48 AM
> >>>>>>>>> Subject: [Gluster-users] glusterfsd process spinning
> >>>>>>>>>
> >>>>>>>>> Hi
> >>>>>>>>>
> >>>>>>>>> I've been suffering from continual problems with my gluster filesystem
> >>>>>>>>> slowing down due to what I thought was congestion on a single brick
> >>>>>>>>> being caused by a problem with the underlying filesystem running slow
> >>>>>>>>> but I've just noticed that the glusterfsd process for that particular
> >>>>>>>>> brick is running at 100%+, even when the filesystem is almost idle.
> >>>>>>>>>
> >>>>>>>>> I've done a couple of straces of the brick and another on the same
> >>>>>>>>> server, does the high number of futex errors give any clues as to what
> >>>>>>>>> might be wrong?
> >>>>>>>>>
> >>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> >>>>>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>> 45.58    0.027554           0    191665     20772 futex
> >>>>>>>>> 28.26    0.017084           0    137133           readv
> >>>>>>>>> 26.04    0.015743           0     66259           epoll_wait
> >>>>>>>>>      0.13    0.000077           3        23           writev
> >>>>>>>>>      0.00    0.000000           0         1           epoll_ctl
> >>>>>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>> 100.00    0.060458                395081     20772 total
> >>>>>>>>>
> >>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> >>>>>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>> 99.25    0.334020         133      2516           epoll_wait
> >>>>>>>>>      0.40    0.001347           0      4090        26 futex
> >>>>>>>>>      0.35    0.001192           0      5064           readv
> >>>>>>>>>      0.00    0.000000           0        20           writev
> >>>>>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>> 100.00    0.336559                 11690        26 total
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Gluster-users mailing list
> >>>>>>>>> Gluster-users at gluster.org
> >>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>>>>>>>>
> 




More information about the Gluster-users mailing list