[Gluster-users] afr-self-heald.c:479:afr_shd_index_sweep

Thu Jun 29 07:38:03 UTC 2017

Hi all,

for the upgrade I followed this procedure:

  * put node in maintenance mode (ensure no client are active)
  * yum versionlock delete glusterfs*
  * service glusterd stop
  * yum update
  * systemctl daemon-reload
  * service glusterd start
  * yum versionlock add glusterfs*
  * gluster volume heal vm-images-repo full
  * gluster volume heal vm-images-repo info

on each server every time I ran 'gluster --version' to confirm the
upgrade, at the end I ran 'gluster volume set all cluster.op-version 30800'.

Today I've tried to manually kill a brick process on a non critical
volume, after that into the log I see:

[2017-06-29 07:03:50.074388] I [MSGID: 100030] [glusterfsd.c:2454:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version
3.8.12 (args: /usr/sbin/glusterfsd -s virtnode-0-1-gluster --volfile-id
iso-images-repo.virtnode-0-1-gluster.data-glusterfs-brick1b-iso-images-repo
-p
/var/lib/glusterd/vols/iso-images-repo/run/virtnode-0-1-gluster-data-glusterfs-brick1b-iso-images-repo.pid
-S /var/run/gluster/c779852c21e2a91eaabbdda3b9127262.socket --brick-name
/data/glusterfs/brick1b/iso-images-repo -l
/var/log/glusterfs/bricks/data-glusterfs-brick1b-iso-images-repo.log
--xlator-option
*-posix.glusterd-uuid=e93ebee7-5d95-4100-a9df-4a3e60134b73 --brick-port
49163 --xlator-option iso-images-repo-server.listen-port=49163)

I've checked after the restart and indeed now the directory
'entry-changes' is created, but why stopping the glusterd service has
not stopped also the brick processes?

Now how can I recover from this issue? Restarting all brick processes is
enough?

Greetings,

    Paolo Margara

Il 28/06/2017 18:41, Pranith Kumar Karampuri ha scritto:
>
>
> On Wed, Jun 28, 2017 at 9:45 PM, Ravishankar N <ravishankar at redhat.com
> <mailto:ravishankar at redhat.com>> wrote:
>
>     On 06/28/2017 06:52 PM, Paolo Margara wrote:
>
>         Hi list,
>
>         yesterday I noted the following lines into the glustershd.log
>         log file:
>
>         [2017-06-28 11:53:05.000890] W [MSGID: 108034]
>         [afr-self-heald.c:479:afr_shd_index_sweep]
>         0-iso-images-repo-replicate-0: unable to get index-dir on
>         iso-images-repo-client-0
>         [2017-06-28 11:53:05.001146] W [MSGID: 108034]
>         [afr-self-heald.c:479:afr_shd_index_sweep]
>         0-vm-images-repo-replicate-0:
>         unable to get index-dir on vm-images-repo-client-0
>         [2017-06-28 11:53:06.001141] W [MSGID: 108034]
>         [afr-self-heald.c:479:afr_shd_index_sweep]
>         0-hosted-engine-replicate-0:
>         unable to get index-dir on hosted-engine-client-0
>         [2017-06-28 11:53:08.001094] W [MSGID: 108034]
>         [afr-self-heald.c:479:afr_shd_index_sweep]
>         0-vm-images-repo-replicate-2:
>         unable to get index-dir on vm-images-repo-client-6
>         [2017-06-28 11:53:08.001170] W [MSGID: 108034]
>         [afr-self-heald.c:479:afr_shd_index_sweep]
>         0-vm-images-repo-replicate-1:
>         unable to get index-dir on vm-images-repo-client-3
>
>         Digging into the mailing list archive I've found another user
>         with a
>         similar issue (the thread was '[Gluster-users] glustershd:
>         unable to get
>         index-dir on myvolume-client-0'), the solution suggested was
>         to verify
>         if the  /<path-to-backend-brick>/.glusterfs/indices directory
>         contains
>         all these sub directories: 'dirty', 'entry-changes' and
>         'xattrop' and if
>         some of them does not exists simply create it with mkdir.
>
>         In my case the 'entry-changes' directory is not present on all the
>         bricks and on all the servers:
>
>         /data/glusterfs/brick1a/hosted-engine/.glusterfs/indices/:
>         total 0
>         drw------- 2 root root 55 Jun 28 15:02 dirty
>         drw------- 2 root root 57 Jun 28 15:02 xattrop
>
>         /data/glusterfs/brick1b/iso-images-repo/.glusterfs/indices/:
>         total 0
>         drw------- 2 root root 55 May 29 14:04 dirty
>         drw------- 2 root root 57 May 29 14:04 xattrop
>
>         /data/glusterfs/brick2/vm-images-repo/.glusterfs/indices/:
>         total 0
>         drw------- 2 root root 112 Jun 28 15:02 dirty
>         drw------- 2 root root  66 Jun 28 15:02 xattrop
>
>         /data/glusterfs/brick3/vm-images-repo/.glusterfs/indices/:
>         total 0
>         drw------- 2 root root 64 Jun 28 15:02 dirty
>         drw------- 2 root root 66 Jun 28 15:02 xattrop
>
>         /data/glusterfs/brick4/vm-images-repo/.glusterfs/indices/:
>         total 0
>         drw------- 2 root root 112 Jun 28 15:02 dirty
>         drw------- 2 root root  66 Jun 28 15:02 xattrop
>
>         I've recently upgraded gluster from 3.7.16 to 3.8.12 with the
>         rolling
>         upgrade procedure and I haven't noted this issue prior of the
>         update, on
>         another system upgraded with the same procedure I haven't
>         encountered
>         this problem.
>
>         Currently all VM images appear to be OK but prior to create the
>         'entry-changes' I would like to ask if this is still the correct
>         procedure to fix this issue
>
>
>     Did you restart the bricks after the upgrade? That should have
>     created the entry-changes directory. Can you kill the brick and
>     restart it and see if the dir is created? Double check from the
>     brick logs that you're indeed running 3.12:  "Started running
>     /usr/local/sbin/glusterfsd version 3.8.12" should appear when the
>     brick starts.
>
>
> Please note that if you are going the route of killing and restarting,
> you need to do it in the same way you did rolling upgrade. You need to
> wait for heal to complete before you kill the other nodes. But before
> you do this, it is better you look at the logs or confirm the steps
> you used for doing upgrade.
>  
>
>
>     -Ravi
>
>
>           and if this problem could have affected the
>         heal operations occurred meanwhile.
>
>         Thanks.
>
>
>         Greetings,
>
>              Paolo Margara
>
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>         http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
> -- 
> Pranith

-- 
LABINF - HPC at POLITO
DAUIN - Politecnico di Torino
Corso Castelfidardo, 34D - 10129 Torino (TO)
phone: +39 011 090 7051
site: http://www.labinf.polito.it/
site: http://hpc.polito.it/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170629/0a7f8ae0/attachment.html>