[Gluster-users] afr-self-heald.c:479:afr_shd_index_sweep

Paolo Margara paolo.margara at polito.it
Thu Jun 29 14:42:58 UTC 2017


Il 29/06/2017 16:27, Pranith Kumar Karampuri ha scritto:

>
>
> On Thu, Jun 29, 2017 at 7:48 PM, Paolo Margara
> <paolo.margara at polito.it <mailto:paolo.margara at polito.it>> wrote:
>
>     Hi Pranith,
>
>     I'm using this guide
>     https://github.com/nixpanic/glusterdocs/blob/f6d48dc17f2cb6ee4680e372520ec3358641b2bc/Upgrade-Guide/upgrade_to_3.8.md
>     <https://github.com/nixpanic/glusterdocs/blob/f6d48dc17f2cb6ee4680e372520ec3358641b2bc/Upgrade-Guide/upgrade_to_3.8.md>
>
>     Definitely my fault, but I think that is better to specify
>     somewhere that restarting the service is not enough simply because
>     in many other case, with other services, is sufficient.
>
> The steps include the following command before installing 3.8 as per
> the page
> (https://github.com/nixpanic/glusterdocs/blob/f6d48dc17f2cb6ee4680e372520ec3358641b2bc/Upgrade-Guide/upgrade_to_3.8.md#online-upgrade-procedure-for-servers)
> So I guess we have it covered?
As I said it's my fault ;-)
>
>   * Stop all gluster services using the below command or through your
>     favorite way to stop them.
>   * killall glusterfs glusterfsd glusterd
>
>  
>
>     Now I'm restarting every brick process (and waiting for the heal
>     to complete), this is fixing my problem.
>
>     Many thanks for the help.
>
>
>     Greetings,
>
>         Paolo
>
>
>     Il 29/06/2017 13:03, Pranith Kumar Karampuri ha scritto:
>>     Paolo,
>>           Which document did you follow for the upgrade? We can fix
>>     the documentation if there are any issues.
>>
>>     On Thu, Jun 29, 2017 at 2:07 PM, Ravishankar N
>>     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>
>>         On 06/29/2017 01:08 PM, Paolo Margara wrote:
>>>
>>>         Hi all,
>>>
>>>         for the upgrade I followed this procedure:
>>>
>>>           * put node in maintenance mode (ensure no client are active)
>>>           * yum versionlock delete glusterfs*
>>>           * service glusterd stop
>>>           * yum update
>>>           * systemctl daemon-reload
>>>           * service glusterd start
>>>           * yum versionlock add glusterfs*
>>>           * gluster volume heal vm-images-repo full
>>>           * gluster volume heal vm-images-repo info
>>>
>>>         on each server every time I ran 'gluster --version' to
>>>         confirm the upgrade, at the end I ran 'gluster volume set
>>>         all cluster.op-version 30800'.
>>>
>>>         Today I've tried to manually kill a brick process on a non
>>>         critical volume, after that into the log I see:
>>>
>>>         [2017-06-29 07:03:50.074388] I [MSGID: 100030]
>>>         [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfsd: Started
>>>         running /usr/sbin/glusterfsd version 3.8.12 (args:
>>>         /usr/sbin/glusterfsd -s virtnode-0-1-gluster --volfile-id
>>>         iso-images-repo.virtnode-0-1-gluster.data-glusterfs-brick1b-iso-images-repo
>>>         -p
>>>         /var/lib/glusterd/vols/iso-images-repo/run/virtnode-0-1-gluster-data-glusterfs-brick1b-iso-images-repo.pid
>>>         -S /var/run/gluster/c779852c21e2a91eaabbdda3b9127262.socket
>>>         --brick-name /data/glusterfs/brick1b/iso-images-repo -l
>>>         /var/log/glusterfs/bricks/data-glusterfs-brick1b-iso-images-repo.log
>>>         --xlator-option
>>>         *-posix.glusterd-uuid=e93ebee7-5d95-4100-a9df-4a3e60134b73
>>>         --brick-port 49163 --xlator-option
>>>         iso-images-repo-server.listen-port=49163)
>>>
>>>         I've checked after the restart and indeed now the directory
>>>         'entry-changes' is created, but why stopping the glusterd
>>>         service has not stopped also the brick processes?
>>>
>>
>>         Just stopping,upgrading and restarting glusterd does not
>>         restart the brick processes, You would need to kill all
>>         gluster processes on the node before upgrading.  After
>>         upgrading, when you restart glusterd, it will automatically
>>         spawn the rest of the gluster processes on that node.
>>          
>>>
>>>         Now how can I recover from this issue? Restarting all brick
>>>         processes is enough?
>>>
>>         Yes, but ensure there are no pending heals like Pranith
>>         mentioned.
>>         https://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.7/
>>         <https://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.7/> 
>>         lists the steps for upgrade to 3.7 but the steps mentioned
>>         there are similar for any rolling upgrade.
>>
>>         -Ravi
>>
>>>
>>>         Greetings,
>>>
>>>             Paolo Margara
>>>
>>>
>>>         Il 28/06/2017 18:41, Pranith Kumar Karampuri ha scritto:
>>>>
>>>>
>>>>         On Wed, Jun 28, 2017 at 9:45 PM, Ravishankar N
>>>>         <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>>
>>>>             On 06/28/2017 06:52 PM, Paolo Margara wrote:
>>>>
>>>>                 Hi list,
>>>>
>>>>                 yesterday I noted the following lines into the
>>>>                 glustershd.log log file:
>>>>
>>>>                 [2017-06-28 11:53:05.000890] W [MSGID: 108034]
>>>>                 [afr-self-heald.c:479:afr_shd_index_sweep]
>>>>                 0-iso-images-repo-replicate-0: unable to get
>>>>                 index-dir on
>>>>                 iso-images-repo-client-0
>>>>                 [2017-06-28 11:53:05.001146] W [MSGID: 108034]
>>>>                 [afr-self-heald.c:479:afr_shd_index_sweep]
>>>>                 0-vm-images-repo-replicate-0:
>>>>                 unable to get index-dir on vm-images-repo-client-0
>>>>                 [2017-06-28 11:53:06.001141] W [MSGID: 108034]
>>>>                 [afr-self-heald.c:479:afr_shd_index_sweep]
>>>>                 0-hosted-engine-replicate-0:
>>>>                 unable to get index-dir on hosted-engine-client-0
>>>>                 [2017-06-28 11:53:08.001094] W [MSGID: 108034]
>>>>                 [afr-self-heald.c:479:afr_shd_index_sweep]
>>>>                 0-vm-images-repo-replicate-2:
>>>>                 unable to get index-dir on vm-images-repo-client-6
>>>>                 [2017-06-28 11:53:08.001170] W [MSGID: 108034]
>>>>                 [afr-self-heald.c:479:afr_shd_index_sweep]
>>>>                 0-vm-images-repo-replicate-1:
>>>>                 unable to get index-dir on vm-images-repo-client-3
>>>>
>>>>                 Digging into the mailing list archive I've found
>>>>                 another user with a
>>>>                 similar issue (the thread was '[Gluster-users]
>>>>                 glustershd: unable to get
>>>>                 index-dir on myvolume-client-0'), the solution
>>>>                 suggested was to verify
>>>>                 if the  /<path-to-backend-brick>/.glusterfs/indices
>>>>                 directory contains
>>>>                 all these sub directories: 'dirty', 'entry-changes'
>>>>                 and 'xattrop' and if
>>>>                 some of them does not exists simply create it with
>>>>                 mkdir.
>>>>
>>>>                 In my case the 'entry-changes' directory is not
>>>>                 present on all the
>>>>                 bricks and on all the servers:
>>>>
>>>>                 /data/glusterfs/brick1a/hosted-engine/.glusterfs/indices/:
>>>>                 total 0
>>>>                 drw------- 2 root root 55 Jun 28 15:02 dirty
>>>>                 drw------- 2 root root 57 Jun 28 15:02 xattrop
>>>>
>>>>                 /data/glusterfs/brick1b/iso-images-repo/.glusterfs/indices/:
>>>>                 total 0
>>>>                 drw------- 2 root root 55 May 29 14:04 dirty
>>>>                 drw------- 2 root root 57 May 29 14:04 xattrop
>>>>
>>>>                 /data/glusterfs/brick2/vm-images-repo/.glusterfs/indices/:
>>>>                 total 0
>>>>                 drw------- 2 root root 112 Jun 28 15:02 dirty
>>>>                 drw------- 2 root root  66 Jun 28 15:02 xattrop
>>>>
>>>>                 /data/glusterfs/brick3/vm-images-repo/.glusterfs/indices/:
>>>>                 total 0
>>>>                 drw------- 2 root root 64 Jun 28 15:02 dirty
>>>>                 drw------- 2 root root 66 Jun 28 15:02 xattrop
>>>>
>>>>                 /data/glusterfs/brick4/vm-images-repo/.glusterfs/indices/:
>>>>                 total 0
>>>>                 drw------- 2 root root 112 Jun 28 15:02 dirty
>>>>                 drw------- 2 root root  66 Jun 28 15:02 xattrop
>>>>
>>>>                 I've recently upgraded gluster from 3.7.16 to
>>>>                 3.8.12 with the rolling
>>>>                 upgrade procedure and I haven't noted this issue
>>>>                 prior of the update, on
>>>>                 another system upgraded with the same procedure I
>>>>                 haven't encountered
>>>>                 this problem.
>>>>
>>>>                 Currently all VM images appear to be OK but prior
>>>>                 to create the
>>>>                 'entry-changes' I would like to ask if this is
>>>>                 still the correct
>>>>                 procedure to fix this issue
>>>>
>>>>
>>>>             Did you restart the bricks after the upgrade? That
>>>>             should have created the entry-changes directory. Can
>>>>             you kill the brick and restart it and see if the dir is
>>>>             created? Double check from the brick logs that you're
>>>>             indeed running 3.12:  "Started running
>>>>             /usr/local/sbin/glusterfsd version 3.8.12" should
>>>>             appear when the brick starts.
>>>>
>>>>
>>>>         Please note that if you are going the route of killing and
>>>>         restarting, you need to do it in the same way you did
>>>>         rolling upgrade. You need to wait for heal to complete
>>>>         before you kill the other nodes. But before you do this, it
>>>>         is better you look at the logs or confirm the steps you
>>>>         used for doing upgrade.
>>>>          
>>>>
>>>>
>>>>             -Ravi
>>>>
>>>>
>>>>                   and if this problem could have affected the
>>>>                 heal operations occurred meanwhile.
>>>>
>>>>                 Thanks.
>>>>
>>>>
>>>>                 Greetings,
>>>>
>>>>                      Paolo Margara
>>>>
>>>>                 _______________________________________________
>>>>                 Gluster-users mailing list
>>>>                 Gluster-users at gluster.org
>>>>                 <mailto:Gluster-users at gluster.org>
>>>>                 http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>                 <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>
>>>>
>>>>
>>>>             _______________________________________________
>>>>             Gluster-users mailing list
>>>>             Gluster-users at gluster.org
>>>>             <mailto:Gluster-users at gluster.org>
>>>>             http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>             <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>
>>>>
>>>>
>>>>
>>>>         -- 
>>>>         Pranith
>>
>>     -- 
>>     Pranith
>
>
>
>
> -- 
> Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170629/93e53240/attachment.html>


More information about the Gluster-users mailing list