[Gluster-users] afr-self-heald.c:479:afr_shd_index_sweep

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Jun 29 11:03:33 UTC 2017


Paolo,
      Which document did you follow for the upgrade? We can fix the
documentation if there are any issues.

On Thu, Jun 29, 2017 at 2:07 PM, Ravishankar N <ravishankar at redhat.com>
wrote:

> On 06/29/2017 01:08 PM, Paolo Margara wrote:
>
> Hi all,
>
> for the upgrade I followed this procedure:
>
>    - put node in maintenance mode (ensure no client are active)
>    - yum versionlock delete glusterfs*
>    - service glusterd stop
>    - yum update
>    - systemctl daemon-reload
>    - service glusterd start
>    - yum versionlock add glusterfs*
>    - gluster volume heal vm-images-repo full
>    - gluster volume heal vm-images-repo info
>
> on each server every time I ran 'gluster --version' to confirm the
> upgrade, at the end I ran 'gluster volume set all cluster.op-version 30800'.
>
> Today I've tried to manually kill a brick process on a non critical
> volume, after that into the log I see:
>
> [2017-06-29 07:03:50.074388] I [MSGID: 100030] [glusterfsd.c:2454:main]
> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.8.12
> (args: /usr/sbin/glusterfsd -s virtnode-0-1-gluster --volfile-id
> iso-images-repo.virtnode-0-1-gluster.data-glusterfs-brick1b-iso-images-repo
> -p /var/lib/glusterd/vols/iso-images-repo/run/virtnode-0-1-
> gluster-data-glusterfs-brick1b-iso-images-repo.pid -S /var/run/gluster/
> c779852c21e2a91eaabbdda3b9127262.socket --brick-name
> /data/glusterfs/brick1b/iso-images-repo -l /var/log/glusterfs/bricks/
> data-glusterfs-brick1b-iso-images-repo.log --xlator-option
> *-posix.glusterd-uuid=e93ebee7-5d95-4100-a9df-4a3e60134b73 --brick-port
> 49163 --xlator-option iso-images-repo-server.listen-port=49163)
>
> I've checked after the restart and indeed now the directory
> 'entry-changes' is created, but why stopping the glusterd service has not
> stopped also the brick processes?
>
>
> Just stopping,upgrading and restarting glusterd does not restart the brick
> processes, You would need to kill all gluster processes on the node before
> upgrading.  After upgrading, when you restart glusterd, it will
> automatically spawn the rest of the gluster processes on that node.
>
>
> Now how can I recover from this issue? Restarting all brick processes is
> enough?
>
> Yes, but ensure there are no pending heals like Pranith mentioned.
> https://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.7/
> lists the steps for upgrade to 3.7 but the steps mentioned there are
> similar for any rolling upgrade.
>
> -Ravi
>
>
> Greetings,
>
>     Paolo Margara
>
> Il 28/06/2017 18:41, Pranith Kumar Karampuri ha scritto:
>
>
>
> On Wed, Jun 28, 2017 at 9:45 PM, Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>> On 06/28/2017 06:52 PM, Paolo Margara wrote:
>>
>>> Hi list,
>>>
>>> yesterday I noted the following lines into the glustershd.log log file:
>>>
>>> [2017-06-28 11:53:05.000890] W [MSGID: 108034]
>>> [afr-self-heald.c:479:afr_shd_index_sweep]
>>> 0-iso-images-repo-replicate-0: unable to get index-dir on
>>> iso-images-repo-client-0
>>> [2017-06-28 11:53:05.001146] W [MSGID: 108034]
>>> [afr-self-heald.c:479:afr_shd_index_sweep] 0-vm-images-repo-replicate-0:
>>> unable to get index-dir on vm-images-repo-client-0
>>> [2017-06-28 11:53:06.001141] W [MSGID: 108034]
>>> [afr-self-heald.c:479:afr_shd_index_sweep] 0-hosted-engine-replicate-0:
>>> unable to get index-dir on hosted-engine-client-0
>>> [2017-06-28 11:53:08.001094] W [MSGID: 108034]
>>> [afr-self-heald.c:479:afr_shd_index_sweep] 0-vm-images-repo-replicate-2:
>>> unable to get index-dir on vm-images-repo-client-6
>>> [2017-06-28 11:53:08.001170] W [MSGID: 108034]
>>> [afr-self-heald.c:479:afr_shd_index_sweep] 0-vm-images-repo-replicate-1:
>>> unable to get index-dir on vm-images-repo-client-3
>>>
>>> Digging into the mailing list archive I've found another user with a
>>> similar issue (the thread was '[Gluster-users] glustershd: unable to get
>>> index-dir on myvolume-client-0'), the solution suggested was to verify
>>> if the  /<path-to-backend-brick>/.glusterfs/indices directory contains
>>> all these sub directories: 'dirty', 'entry-changes' and 'xattrop' and if
>>> some of them does not exists simply create it with mkdir.
>>>
>>> In my case the 'entry-changes' directory is not present on all the
>>> bricks and on all the servers:
>>>
>>> /data/glusterfs/brick1a/hosted-engine/.glusterfs/indices/:
>>> total 0
>>> drw------- 2 root root 55 Jun 28 15:02 dirty
>>> drw------- 2 root root 57 Jun 28 15:02 xattrop
>>>
>>> /data/glusterfs/brick1b/iso-images-repo/.glusterfs/indices/:
>>> total 0
>>> drw------- 2 root root 55 May 29 14:04 dirty
>>> drw------- 2 root root 57 May 29 14:04 xattrop
>>>
>>> /data/glusterfs/brick2/vm-images-repo/.glusterfs/indices/:
>>> total 0
>>> drw------- 2 root root 112 Jun 28 15:02 dirty
>>> drw------- 2 root root  66 Jun 28 15:02 xattrop
>>>
>>> /data/glusterfs/brick3/vm-images-repo/.glusterfs/indices/:
>>> total 0
>>> drw------- 2 root root 64 Jun 28 15:02 dirty
>>> drw------- 2 root root 66 Jun 28 15:02 xattrop
>>>
>>> /data/glusterfs/brick4/vm-images-repo/.glusterfs/indices/:
>>> total 0
>>> drw------- 2 root root 112 Jun 28 15:02 dirty
>>> drw------- 2 root root  66 Jun 28 15:02 xattrop
>>>
>>> I've recently upgraded gluster from 3.7.16 to 3.8.12 with the rolling
>>> upgrade procedure and I haven't noted this issue prior of the update, on
>>> another system upgraded with the same procedure I haven't encountered
>>> this problem.
>>>
>>> Currently all VM images appear to be OK but prior to create the
>>> 'entry-changes' I would like to ask if this is still the correct
>>> procedure to fix this issue
>>>
>>
>> Did you restart the bricks after the upgrade? That should have created
>> the entry-changes directory. Can you kill the brick and restart it and see
>> if the dir is created? Double check from the brick logs that you're indeed
>> running 3.12:  "Started running /usr/local/sbin/glusterfsd version 3.8.12"
>> should appear when the brick starts.
>>
>
> Please note that if you are going the route of killing and restarting, you
> need to do it in the same way you did rolling upgrade. You need to wait for
> heal to complete before you kill the other nodes. But before you do this,
> it is better you look at the logs or confirm the steps you used for doing
> upgrade.
>
>
>>
>> -Ravi
>>
>>
>>   and if this problem could have affected the
>>> heal operations occurred meanwhile.
>>>
>>> Thanks.
>>>
>>>
>>> Greetings,
>>>
>>>      Paolo Margara
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Pranith
>
>
> --
> LABINF - HPC at POLITO
> DAUIN - Politecnico di Torino
> Corso Castelfidardo, 34D - 10129 Torino (TO)
> phone: +39 011 090 7051
> site: http://www.labinf.polito.it/
> site: http://hpc.polito.it/
>
>
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170629/3acf12d6/attachment.html>


More information about the Gluster-users mailing list