[Gluster-users] volume not working after yum update - gluster 3.6.3
Kingsley
gluster at gluster.dogwind.com
Tue Aug 11 08:08:35 UTC 2015
On Tue, 2015-08-11 at 11:14 +0530, Atin Mukherjee wrote:
>
> On 08/11/2015 10:44 AM, Kingsley wrote:
> > On Tue, 2015-08-11 at 07:48 +0530, Atin Mukherjee wrote:
> >
> >> -Atin
> >> Sent from one plus one
> >> On Aug 10, 2015 11:58 PM, "Kingsley" <gluster at gluster.dogwind.com>
> >> wrote:
> >>>
> >>>
> >>> On Mon, 2015-08-10 at 22:53 +0530, Atin Mukherjee wrote:
> >>> [snip]
> >>>>
> >>>>> stat("/sys/fs/selinux", {st_mode=S_IFDIR|0755, st_size=0, ...}) =
> >> 0
> >>>>
> >>>>> brk(0) = 0x8db000
> >>>>> brk(0x8fc000) = 0x8fc000
> >>>>> mkdir("test", 0777
> >>>> Can you also collect the statedump of all the brick processes when
> >> the command is hung?
> >>>>
> >>>> + Ravi, could you check this?
> >>>
> >>>
> >>> I ran the command but I could not find where it put the output:
> >
> > [snip]
> >
> >>> Where should I find the output of the statedump command?
> >> It should be there in var/run/gluster folder
> >
> >
> > Thanks - replied offlist.
> Could you forward the statedump details to Ravi as well? (In cc)
Hi,
It appears that the volume may have repaired itself, which is a pleasing
outcome.
The "strace mkdir test" command in the broken directory finally came
back (the output previously ended at 'mkdir("test", 0777' [without the
single quotes]), but I've now seen that it has completed (see below).
I've no idea what time it actually finished, but I suspect it was hours
later; the output finally ended:
mkdir("test", 0777) = 0
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++
I just tested "mkdir test2" in the same directory and it worked
perfectly. What's more, the directories both exist as they should:
[root at voicemail1b-1 14391.broken]# ls -ld test*
drwxr-xr-x. 2 root root 10 Aug 11 05:46 test
drwxr-xr-x. 2 root root 10 Aug 11 09:03 test2
[root at voicemail1b-1 14391.broken]#
Volume heal no longer claims anything is happening:
[root at gluster1b-1 14391]# gluster volume heal callrec info
Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0
Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0
Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0
Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0
[root at gluster1b-1 14391]#
Because of the job backlog from yesterday, the system was very disk I/O
bound, which was slowing everything right down. Obviously this wouldn't
have helped a self heal, though I've no idea how long that would
normally take.
Cheers,
Kingsley.
More information about the Gluster-users
mailing list