[Gluster-users] Failures during rebalance on gluster distributed disperse volume

Thu Sep 13 11:38:54 UTC 2018

This looks like an issue because rebalance switched to using fallocate
which EC did not have implemented at that point.

@Pranith, @Ashish, which version of gluster had support for fallocate in EC?

Regards,
Nithya

On 12 September 2018 at 19:24, Mauro Tridici <mauro.tridici at cmcc.it> wrote:

> Dear All,
>
> I recently added 3 servers (each one with 12 bricks) to an existing
> Gluster Distributed Disperse Volume.
> Volume extension has been completed without error and I already executed
> the rebalance procedure with fix-layout option with no problem.
> I just launched the rebalance procedure without fix-layout option, but, as
> you can see in the output below, I noticed that some failures have been
> detected.
>
> [root at s01 glusterfs]# gluster v rebalance tier2 status
>                                     Node Rebalanced-files          size
>     scanned      failures       skipped               status  run time in
> h:m:s
>                                ---------      -----------   -----------
> -----------   -----------   -----------         ------------
> --------------
>                                localhost            71176         3.2MB
>     2137557       1530391          8128          in progress       13:59:05
>                                  s02-stg                0        0Bytes
>           0             0             0            completed       11:53:28
>                                  s03-stg                0        0Bytes
>           0             0             0            completed       11:53:32
>                                  s04-stg                0        0Bytes
>           0             0             0            completed        0:00:06
>                                  s05-stg               15        0Bytes
>       17055             0            18            completed       10:48:01
>                                  s06-stg                0        0Bytes
>           0             0             0            completed        0:00:06
> Estimated time left for rebalance to complete :        0:46:53
> volume rebalance: tier2: success
>
> In the volume rebalance log file, I detected a lot of error messages
> similar to the following ones:
>
> [2018-09-12 13:15:50.756703] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
> 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file -
> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/s
> ps_200508_003.cam.h0.2005-12_grid.nc
> [2018-09-12 13:15:50.757025] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file]
> 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_
> 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
> [2018-09-12 13:15:50.759183] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file]
> 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_
> 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on
> tier2-disperse-9 (Operation not supported)
> [2018-09-12 13:15:50.759206] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
> 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file -
> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/s
> ps_200508_003.cam.h0.2005-09_grid.nc
> [2018-09-12 13:15:50.759536] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file]
> 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_
> 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
> [2018-09-12 13:15:50.777219] E [MSGID: 109023] [dht-rebalance.c:844:__dht_rebalance_create_dst_file]
> 0-tier2-dht: fallocate failed for /CSP/sp1/CESM/archive/sps_
> 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on
> tier2-disperse-10 (Operation not supported)
> [2018-09-12 13:15:50.777241] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
> 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file -
> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/s
> ps_200508_003.cam.h0.2006-01_grid.nc
> [2018-09-12 13:15:50.777676] E [MSGID: 109023] [dht-rebalance.c:2733:gf_defrag_migrate_single_file]
> 0-tier2-dht: migrate-data failed for /CSP/sp1/CESM/archive/sps_
> 200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
>
> Could you please help me to understand what is happening and how to solve
> it?
>
> Our Gluster implementation is based on Gluster v.3.10.5
>
> Thank you in advance,
> Mauro
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180913/ac394f72/attachment.html>