[Gluster-users] Failures during rebalance on gluster distributed disperse volume

Mauro Tridici mauro.tridici at cmcc.it
Thu Sep 13 13:04:55 UTC 2018


Hi Nithya,

thank you for involving EC group.
I will wait for your suggestions.

Regards,
Mauro

> Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran <nbalacha at redhat.com> ha scritto:
> 
> This looks like an issue because rebalance switched to using fallocate which EC did not have implemented at that point.
> 
> @Pranith, @Ashish, which version of gluster had support for fallocate in EC?
> 
> 
> Regards,
> Nithya
> 
> On 12 September 2018 at 19:24, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
> Dear All,
> 
> I recently added 3 servers (each one with 12 bricks) to an existing Gluster Distributed Disperse Volume.
> Volume extension has been completed without error and I already executed the rebalance procedure with fix-layout option with no problem.
> I just launched the rebalance procedure without fix-layout option, but, as you can see in the output below, I noticed that some failures have been detected.
> 
> [root at s01 glusterfs]# gluster v rebalance tier2 status
>                                     Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
>                                ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
>                                localhost            71176         3.2MB       2137557       1530391          8128          in progress       13:59:05
>                                  s02-stg                0        0Bytes             0             0             0            completed       11:53:28
>                                  s03-stg                0        0Bytes             0             0             0            completed       11:53:32
>                                  s04-stg                0        0Bytes             0             0             0            completed        0:00:06
>                                  s05-stg               15        0Bytes         17055             0            18            completed       10:48:01
>                                  s06-stg                0        0Bytes             0             0             0            completed        0:00:06
> Estimated time left for rebalance to complete :        0:46:53
> volume rebalance: tier2: success
> 
> In the volume rebalance log file, I detected a lot of error messages similar to the following ones:
> 
> [2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
> [2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
> [2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/> on tier2-disperse-9 (Operation not supported)
> [2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
> [2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
> [2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/> on tier2-disperse-10 (Operation not supported)
> [2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
> [2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
> 
> Could you please help me to understand what is happening and how to solve it?
> 
> Our Gluster implementation is based on Gluster v.3.10.5
> 
> Thank you in advance,
> Mauro
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 


-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce  IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: mauro.tridici at cmcc.it

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180913/e7d2e970/attachment.html>


More information about the Gluster-users mailing list