[Gluster-users] Failures during rebalance on gluster distributed disperse volume

Sun Sep 16 04:07:13 UTC 2018

Hi Mauro,

Please stop the rebalance before you upgrade.
Thanks,
Nithya

On 15 September 2018 at 22:55, Mauro Tridici <mauro.tridici at cmcc.it> wrote:

>
> Hi Sunil,
>
> many thanks to you too.
> I will follow your suggestions and the guide for upgrading to 3.12
>
> Crossing fingers :-)
> Regards,
> Mauro
>
> Il giorno 15 set 2018, alle ore 11:57, Sunil Kumar Heggodu Gopala Acharya <
> sheggodu at redhat.com> ha scritto:
>
> Hi Mauro,
>
> As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as
> part of https://bugzilla.redhat.com/show_bug.cgi?id=1454686. Hence,
> upgrading to 3.12 as suggested before would be a right move.
>
> Here is the documentation for upgrading to 3.12:
> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/
>
> Regards,
> Sunil kumar Acharya
>
> Senior Software Engineer
> Red Hat
>
> <https://www.redhat.com/>
>
> T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/>
>
> <https://red.ht/sig>
> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>
>
> On Sat, Sep 15, 2018 at 3:42 AM, Mauro Tridici <mauro.tridici at cmcc.it>
> wrote:
>
>>
>> Hi Nithya,
>>
>> thank you very much for your answer.
>> I will wait for @Sunil opinion too before starting the upgrade procedure.
>>
>> Since it will be the first upgrade of our Gluster cluster, I would like
>> to know if it could be a “virtually dangerous" procedure and if it will be
>> the risk of losing data :-)
>> Unfortunately, I can’t do a preventive copy of the volume data in another
>> location.
>> If it is possible, could you please illustrate the right steps needed to
>> complete the upgrade procedure from the 3.10.5 to the 3.12 version?
>>
>> Thank you again, Nithya.
>> Thank you to all of you for the help!
>>
>> Regards,
>> Mauro
>>
>> Il giorno 14 set 2018, alle ore 16:59, Nithya Balachandran <
>> nbalacha at redhat.com> ha scritto:
>>
>> Hi Mauro,
>>
>>
>> The rebalance code started using fallocate in 3.10.5 (
>> https://bugzilla.redhat.com/show_bug.cgi?id=1473132) which works fine on
>> replicated volumes. However, we neglected to test this with EC volumes on
>> 3.10. Once we discovered the issue, the EC fallocate implementation was
>> made available in 3.11.
>>
>> At this point, I'm afraid the only option I see is to upgrade to at least
>> 3.12.
>>
>> @Sunil, do you have anything to add?
>>
>> Regards,
>> Nithya
>>
>> On 13 September 2018 at 18:34, Mauro Tridici <mauro.tridici at cmcc.it>
>> wrote:
>>
>>>
>>> Hi Nithya,
>>>
>>> thank you for involving EC group.
>>> I will wait for your suggestions.
>>>
>>> Regards,
>>> Mauro
>>>
>>> Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran <
>>> nbalacha at redhat.com> ha scritto:
>>>
>>> This looks like an issue because rebalance switched to using fallocate
>>> which EC did not have implemented at that point.
>>>
>>> @Pranith, @Ashish, which version of gluster had support for fallocate in
>>> EC?
>>>
>>>
>>> Regards,
>>> Nithya
>>>
>>> On 12 September 2018 at 19:24, Mauro Tridici <mauro.tridici at cmcc.it>
>>> wrote:
>>>
>>>> Dear All,
>>>>
>>>> I recently added 3 servers (each one with 12 bricks) to an existing
>>>> Gluster Distributed Disperse Volume.
>>>> Volume extension has been completed without error and I already
>>>> executed the rebalance procedure with fix-layout option with no problem.
>>>> I just launched the rebalance procedure without fix-layout option, but,
>>>> as you can see in the output below, I noticed that some failures have been
>>>> detected.
>>>>
>>>> [root at s01 glusterfs]# gluster v rebalance tier2 status
>>>>                                     Node Rebalanced-files          size
>>>>       scanned      failures       skipped               status  run time in
>>>> h:m:s
>>>>                                ---------      -----------   -----------
>>>>   -----------   -----------   -----------         ------------
>>>> --------------
>>>>                                localhost            71176         3.2MB
>>>>       2137557       1530391          8128          in progress
>>>> 13:59:05
>>>>                                  s02-stg                0        0Bytes
>>>>             0             0             0            completed
>>>> 11:53:28
>>>>                                  s03-stg                0        0Bytes
>>>>             0             0             0            completed
>>>> 11:53:32
>>>>                                  s04-stg                0        0Bytes
>>>>             0             0             0            completed
>>>>  0:00:06
>>>>                                  s05-stg               15        0Bytes
>>>>         17055             0            18            completed
>>>> 10:48:01
>>>>                                  s06-stg                0        0Bytes
>>>>             0             0             0            completed
>>>>  0:00:06
>>>> Estimated time left for rebalance to complete :        0:46:53
>>>> volume rebalance: tier2: success
>>>>
>>>> In the volume rebalance log file, I detected a lot of error messages
>>>> similar to the following ones:
>>>>
>>>> [2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>>> 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file -
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2
>>>> 00508_003.cam.h0.2005-12_grid.nc
>>>> [2018-09-12 13:15:50.757025] E [MSGID: 109023]
>>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht:
>>>> migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
>>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
>>>> [2018-09-12 13:15:50.759183] E [MSGID: 109023]
>>>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht:
>>>> fallocate failed for /CSP/sp1/CESM/archive/sps_2005
>>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on
>>>> tier2-disperse-9 (Operation not supported)
>>>> [2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>>> 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file -
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2
>>>> 00508_003.cam.h0.2005-09_grid.nc
>>>> [2018-09-12 13:15:50.759536] E [MSGID: 109023]
>>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht:
>>>> migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
>>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
>>>> [2018-09-12 13:15:50.777219] E [MSGID: 109023]
>>>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht:
>>>> fallocate failed for /CSP/sp1/CESM/archive/sps_2005
>>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on
>>>> tier2-disperse-10 (Operation not supported)
>>>> [2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>>> 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file -
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2
>>>> 00508_003.cam.h0.2006-01_grid.nc
>>>> [2018-09-12 13:15:50.777676] E [MSGID: 109023]
>>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht:
>>>> migrate-data failed for /CSP/sp1/CESM/archive/sps_2005
>>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
>>>>
>>>> Could you please help me to understand what is happening and how to
>>>> solve it?
>>>>
>>>> Our Gluster implementation is based on Gluster v.3.10.5
>>>>
>>>> Thank you in advance,
>>>> Mauro
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>>
>>> -------------------------
>>> Mauro Tridici
>>>
>>> Fondazione CMCC
>>> CMCC Supercomputing Center
>>> presso Complesso Ecotekne - Università del Salento -
>>> Strada Prov.le Lecce - Monteroni sn
>>> 73100 Lecce  IT
>>> http://www.cmcc.it
>>>
>>> mobile: (+39) 327 5630841
>>> email: mauro.tridici at cmcc.it
>>>
>>>
>>
>>
>> -------------------------
>> Mauro Tridici
>>
>> Fondazione CMCC
>> CMCC Supercomputing Center
>> presso Complesso Ecotekne - Università del Salento -
>> Strada Prov.le Lecce - Monteroni sn
>> 73100 Lecce  IT
>> http://www.cmcc.it
>>
>> mobile: (+39) 327 5630841
>> email: mauro.tridici at cmcc.it
>>
>>
>
>
> -------------------------
> Mauro Tridici
>
> Fondazione CMCC
> CMCC Supercomputing Center
> presso Complesso Ecotekne - Università del Salento -
> Strada Prov.le Lecce - Monteroni sn
> 73100 Lecce  IT
> http://www.cmcc.it
>
> mobile: (+39) 327 5630841
> email: mauro.tridici at cmcc.it
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180916/664ab137/attachment.html>