[Gluster-users] volume not working after yum update - gluster 3.6.3

Mon Aug 10 13:49:21 UTC 2015

Further to this, the volume doesn't seem overly healthy. Any idea how I
can get it back into a working state?

Trying to access one particular directory on the clients just hangs. If
I query heal info, that directory appears in the output as possibly
undergoing heal (actual directory name changed as it's private info):

[root at gluster1b-1 ~]# gluster volume heal callrec info
Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
<gfid:164f888f-2049-49e6-ad26-c758ee091863>
/recordings/834723/14391 - Possibly undergoing heal

<gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f>
<gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e>
<gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c>
<gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb>
<gfid:650efeca-b45c-413b-acc3-f0a5853ccebd>
Number of entries: 7

Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0

Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
<gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f>
<gfid:164f888f-2049-49e6-ad26-c758ee091863>
<gfid:650efeca-b45c-413b-acc3-f0a5853ccebd>
<gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e>
/recordings/834723/14391 - Possibly undergoing heal

<gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c>
<gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb>
Number of entries: 7

Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
Number of entries: 0

If I query each brick directly for the number of files/directories
within that, I get 1731 on gluster1a-1 and gluster2a-1, but 1737 on the
other two, using this command:

# find /data/brick/callrec/recordings/834723/14391 -print | wc -l

Cheers,
Kingsley.

On Mon, 2015-08-10 at 11:05 +0100, Kingsley wrote:
> Sorry for the blind panic - restarting the volume seems to have fixed
> it.
> 
> But then my next question - why is this necessary? Surely it undermines
> the whole point of a high availability system?
> 
> Cheers,
> Kingsley.
> 
> On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote:
> > Hi,
> > 
> > We have a 4 way replicated volume using gluster 3.6.3 on CentOS 7.
> > 
> > Over the weekend I did a yum update on each of the bricks in turn, but
> > now when clients (using fuse mounts) try to access the volume, it hangs.
> > Gluster itself wasn't updated (we've disabled that repo so that we keep
> > to 3.6.3 for now).
> > 
> > This was what I did:
> > 
> >       * on first brick, "yum update"
> >       * reboot brick
> >       * watch "gluster volume status" on another brick and wait for it
> >         to say all 4 bricks are online before proceeding to update the
> >         next brick
> > 
> > I was expecting the clients might pause 30 seconds while they notice a
> > brick is offline, but then recover.
> > 
> > I've tried re-mounting clients, but that hasn't helped.
> > 
> > I can't see much data in any of the log files.
> > 
> > I've tried "gluster volume heal callrec" but it doesn't seem to have
> > helped.
> > 
> > What shall I do next?
> > 
> > I've pasted some stuff below in case any of it helps.
> > 
> > Cheers,
> > Kingsley.
> > 
> > [root at gluster1b-1 ~]# gluster volume info callrec
> > 
> > Volume Name: callrec
> > Type: Replicate
> > Volume ID: a39830b7-eddb-4061-b381-39411274131a
> > Status: Started
> > Number of Bricks: 1 x 4 = 4
> > Transport-type: tcp
> > Bricks:
> > Brick1: gluster1a-1:/data/brick/callrec
> > Brick2: gluster1b-1:/data/brick/callrec
> > Brick3: gluster2a-1:/data/brick/callrec
> > Brick4: gluster2b-1:/data/brick/callrec
> > Options Reconfigured:
> > performance.flush-behind: off
> > [root at gluster1b-1 ~]#
> > 
> > 
> > [root at gluster1b-1 ~]# gluster volume status callrec
> > Status of volume: callrec
> > Gluster process                                         Port    Online  Pid
> > ------------------------------------------------------------------------------
> > Brick gluster1a-1:/data/brick/callrec                   49153   Y       6803
> > Brick gluster1b-1:/data/brick/callrec                   49153   Y       2614
> > Brick gluster2a-1:/data/brick/callrec                   49153   Y       2645
> > Brick gluster2b-1:/data/brick/callrec                   49153   Y       4325
> > NFS Server on localhost                                 2049    Y       2769
> > Self-heal Daemon on localhost                           N/A     Y       2789
> > NFS Server on gluster2a-1                               2049    Y       2857
> > Self-heal Daemon on gluster2a-1                         N/A     Y       2814
> > NFS Server on 88.151.41.100                             2049    Y       6833
> > Self-heal Daemon on 88.151.41.100                       N/A     Y       6824
> > NFS Server on gluster2b-1                               2049    Y       4428
> > Self-heal Daemon on gluster2b-1                         N/A     Y       4387
> > 
> > Task Status of Volume callrec
> > ------------------------------------------------------------------------------
> > There are no active volume tasks
> > 
> > [root at gluster1b-1 ~]#
> > 
> > 
> > [root at gluster1b-1 ~]# gluster volume heal callrec info
> > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
> > /to_process - Possibly undergoing heal
> > 
> > Number of entries: 1
> > 
> > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
> > Number of entries: 0
> > 
> > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
> > /to_process - Possibly undergoing heal
> > 
> > Number of entries: 1
> > 
> > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
> > Number of entries: 0
> > 
> > [root at gluster1b-1 ~]#
> > 
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>