[Gluster-users] volume not working after yum update - gluster 3.6.3

Tue Aug 11 08:08:35 UTC 2015

On Tue, 2015-08-11 at 11:14 +0530, Atin Mukherjee wrote:
> 
> On 08/11/2015 10:44 AM, Kingsley wrote:
> > On Tue, 2015-08-11 at 07:48 +0530, Atin Mukherjee wrote:
> > 
> >> -Atin
> >> Sent from one plus one
> >> On Aug 10, 2015 11:58 PM, "Kingsley" <gluster at gluster.dogwind.com>
> >> wrote:
> >>>
> >>>
> >>> On Mon, 2015-08-10 at 22:53 +0530, Atin Mukherjee wrote:
> >>> [snip]
> >>>>
> >>>>> stat("/sys/fs/selinux", {st_mode=S_IFDIR|0755, st_size=0, ...}) =
> >> 0
> >>>>
> >>>>> brk(0)                                  = 0x8db000
> >>>>> brk(0x8fc000)                           = 0x8fc000
> >>>>> mkdir("test", 0777
> >>>> Can you also collect the statedump of all the brick processes when
> >> the command is hung?
> >>>>   
> >>>> + Ravi, could you check this?
> >>>
> >>>
> >>> I ran the command but I could not find where it put the output:
> > 
> > [snip]
> > 
> >>> Where should I find the output of the statedump command?
> >> It should be there in var/run/gluster folder
> > 
> > 
> > Thanks - replied offlist.
> Could you forward the statedump details to Ravi as well? (In cc)

Hi,

It appears that the volume may have repaired itself, which is a pleasing
outcome.

The "strace mkdir test" command in the broken directory finally came
back (the output previously ended at 'mkdir("test", 0777' [without the
single quotes]), but I've now seen that it has completed (see below).
I've no idea what time it actually finished, but I suspect it was hours
later; the output finally ended:

mkdir("test", 0777)                     = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

I just tested "mkdir test2" in the same directory and it worked
perfectly. What's more, the directories both exist as they should:

[root at voicemail1b-1 14391.broken]# ls -ld test*
drwxr-xr-x. 2 root root 10 Aug 11 05:46 test
drwxr-xr-x. 2 root root 10 Aug 11 09:03 test2
[root at voicemail1b-1 14391.broken]#

Volume heal no longer claims anything is happening:

[root at gluster1b-1 14391]# gluster volume heal callrec info
Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0

Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0

Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0

Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0

[root at gluster1b-1 14391]#

Because of the job backlog from yesterday, the system was very disk I/O
bound, which was slowing everything right down. Obviously this wouldn't
have helped a self heal, though I've no idea how long that would
normally take.

Cheers,
Kingsley.