[Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version

Fri Sep 28 15:38:52 UTC 2018

Thank you, Ashish.

I will study and try your solution on my virtual env.
How I can detect the process of a brick on gluster server?

Many Thanks,
Mauro

Il ven 28 set 2018 16:39 Ashish Pandey <aspandey at redhat.com> ha scritto:

>
>
> ------------------------------
> *From: *"Mauro Tridici" <mauro.tridici at cmcc.it>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Friday, September 28, 2018 7:08:41 PM
> *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
> volume        based on 3.12.14 version
>
>
> Dear Ashish,
>
> please excuse me, I'm very sorry for misunderstanding.
> Before contacting you during last days, we checked all network devices
> (switch 10GbE, cables, NICs, servers ports, and so on), operating systems
> version and settings, network bonding configuration, gluster packages
> versions, tuning profiles, etc. but everything seems to be ok. The first 3
> servers (and volume) operated without problem for one year. After we added
> the new 3 servers we noticed something wrong.
> Fortunately, yesterday you gave me an hand to understand where is (or
> could be) the problem.
>
> At this moment, after we re-launched the remove-brick command, it seems
> that the rebalance is going ahead without errors, but it is only scanning
> the files.
> May be that during the future data movement some errors could appear.
>
> For this reason, it could be useful to know how to proceed in case of a
> new failure: insist with approach n.1 or change the strategy?
> We are thinking to try to complete the running remove-brick procedure and
>  make a decision based on the outcome.
>
> Question: could we start approach n.2 also after having successfully
> removed the V1 subvolume?!
>
> >>> Yes, we can do that. My idea is to use replace-brick command.
> We will kill "ONLY" one brick process on s06. We will format this brick.
> Then use replace-brick command to replace brick of a volume on s05 with
> this formatted brick.
> heal will be triggered and data of the respective volume will be placed on
> this brick.
>
> Now, we can format the brick which got freed up on s05 and replace the
> brick which we killed on s06 to s05.
> During this process, we have to make sure heal completed before trying any
> other replace/kill brick.
>
> It is tricky but looks doable. Think about it and try to perform it on
> your virtual environment first before trying on production.
> -------
>
> If it is still possible, could you please illustrate the approach n.2 even
> if I dont have free disks?
> I would like to start thinking about it and test it on a virtual
> environment.
>
> Thank you in advance for your help and patience.
> Regards,
> Mauro
>
>
>
> Il giorno 28 set 2018, alle ore 14:36, Ashish Pandey <aspandey at redhat.com>
> ha scritto:
>
>
> We could have taken approach -2 even if you did not have free disks. You
> should have told me why are you
> opting Approach-1 or perhaps I should have asked.
> I was wondering for approach 1 because sometimes re-balance takes time
> depending upon the data size.
>
> Anyway, I hope whole setup is stable, I mean it is not in the middle of
> something which we can not stop.
> If free disks are the only concern I will give you some more steps to deal
> with it and follow the approach 2.
>
> Let me know once you think everything is fine with the system and there is
> nothing to heal.
>
> ---
> Ashish
>
> ------------------------------
> *From: *"Mauro Tridici" <mauro.tridici at cmcc.it>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Friday, September 28, 2018 4:21:03 PM
> *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
> volume based on 3.12.14 version
>
>
> Hi Ashish,
>
> as I said in my previous message, we adopted the first approach you
> suggested (setting network.ping-timeout option to 0).
> This choice was due to the absence of empty brick to be used as indicated
> in the second approach.
>
> So, we launched remove-brick command on the first subvolume (V1, bricks
> 1,2,3,4,5,6 on server s04).
> Rebalance started moving the data across the other bricks, but, after
> about 3TB of moved data, rebalance speed slowed down and some transfer
> errors appeared in the rebalance.log of server s04.
> At this point, since remaining 1,8TB need to be moved in order to complete
> the step, we decided to stop the remove-brick execution and start it again
> (I hope it doesn’t stop again before complete the rebalance)
>
> Now rebalance is not moving data, it’s only scanning files (please, take a
> look to the following output)
>
> [root at s01 ~]# gluster volume remove-brick tier2
> s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick
> s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick
> s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status
>                                     Node Rebalanced-files          size
>     scanned      failures       skipped               status  run time in
> h:m:s
>                                ---------      -----------   -----------
> -----------   -----------   -----------         ------------
> --------------
>                                  s04-stg                0        0Bytes
>     182008             0             0          in progress        3:08:09
> Estimated time left for rebalance to complete :      442:45:06
>
> If I’m not wrong, remove-brick rebalances entire cluster each time it
> start.
> Is there a way to speed up this procedure? Do you have some other
> suggestion that, in this particular case, could be useful to reduce errors
> (I know that they are related to the current volume configuration) and
> improve rebalance performance avoiding to rebalance the entire cluster?
>
> Thank you in advance,
> Mauro
>
> Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey <aspandey at redhat.com>
> ha scritto:
>
>
> Yes, you can.
> If not me others may also reply.
>
> ---
> Ashish
>
> ------------------------------
> *From: *"Mauro Tridici" <mauro.tridici at cmcc.it>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Thursday, September 27, 2018 4:24:12 PM
> *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
> volume        based on 3.12.14 version
>
>
> Dear Ashish,
>
> I can not thank you enough!
> Your procedure and description is very detailed.
> I think to follow the first approach after setting network.ping-timeout
> option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this
> value reduced rebalance errors).
> After the fix I will set network.ping-timeout option to default value.
>
> Could I contact you again if I need some kind of suggestion?
>
> Thank you very much again.
> Have a good day,
> Mauro
>
>
> Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey <aspandey at redhat.com>
> ha scritto:
>
>
> Hi Mauro,
>
> We can divide the 36 newly added bricks into 6 set of 6 bricks each
> starting from brick37.
> That means, there are 6 ec subvolumes and we have to deal with one sub
> volume at a time.
> I have named it V1 to V6.
>
> Problem:
> Take the case of V1.
> The best configuration/setup would be to have all the 6 bricks of V1 on 6
> different nodes.
> However, in your case you have added 3 new nodes. So, at least we should
> have 2 bricks on 3 different newly added nodes.
> This way, in 4+2 EC configuration, even if one node goes down you will
> have 4 other bricks of that volume and the data on that volume would be
> accessible.
> In current setup if s04-stg goes down, you will loose all the data on V1
> and V2 as all the bricks will be down. We want to avoid and correct it.
>
> Now, we can have two approach to correct/modify this setup.
>
> *Approach 1*
> We have to remove all the newly added bricks in a set of 6 bricks. This
> will trigger re- balance and move whole data to other sub volumes.
> Repeat the above step and then once all the bricks are removed, add those
> bricks again in a set of 6 bricks, this time have 2 bricks from each of the
> 3 newly added Nodes.
>
> While this is a valid and working approach, I personally think that this
> will take long time and also require lot of movement of data.
>
> *Approach 2*
>
> In this approach we can use the heal process. We have to deal with all the
> volumes (V1 to V6) one by one. Following are the steps for V1-
>
> *Step 1 - *
> Use replace-brick command to move following bricks on *s05-stg* node *one
> by one (heal should be completed after every replace brick command)*
>
>
> *Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>*
>
> *Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is
> free>*
>
> Command :
> gluster v replace-brick <volname>  *s04-stg:/gluster/mnt3/brick*     *s05-stg:/<brick
> which is free>*    commit force
> Try to give names to the bricks so that you can identify which 6 bricks
> belongs to same ec subvolume
>
>
> Use replace-brick command to move following bricks on *s06-stg* node one
> by one
>
> Brick41: s04-stg:/gluster/mnt5/brick to  *s06-stg/<brick which is free>*
> Brick42: s04-stg:/gluster/mnt6/brick to  *s06-stg/<other brick which is
> free>*
>
>
> *Step 2* - After, every replace-brick command, you have to wait for heal
> to be completed.
> check *"gluster v heal <volname> info "* if it shows any entry you have
> to wait for it to be completed.
>
> After successful step 1 and step 2, setup for sub volume V1 will be fixed.
> The same steps you have to perform for other volumes. Only thing is that
> the nodes would be different on which you have to move the bricks.
>
>
>
>
> V1
>
> Brick37: s04-stg:/gluster/mnt1/brick
> Brick38: s04-stg:/gluster/mnt2/brick
> Brick39: s04-stg:/gluster/mnt3/brick
> Brick40: s04-stg:/gluster/mnt4/brick
> Brick41: s04-stg:/gluster/mnt5/brick
> Brick42: s04-stg:/gluster/mnt6/brick
>
> V2
> Brick43: s04-stg:/gluster/mnt7/brick
> Brick44: s04-stg:/gluster/mnt8/brick
> Brick45: s04-stg:/gluster/mnt9/brick
> Brick46: s04-stg:/gluster/mnt10/brick
> Brick47: s04-stg:/gluster/mnt11/brick
> Brick48: s04-stg:/gluster/mnt12/brick
>
> V3
> Brick49: s05-stg:/gluster/mnt1/brick
> Brick50: s05-stg:/gluster/mnt2/brick
> Brick51: s05-stg:/gluster/mnt3/brick
> Brick52: s05-stg:/gluster/mnt4/brick
> Brick53: s05-stg:/gluster/mnt5/brick
> Brick54: s05-stg:/gluster/mnt6/brick
>
> V4
> Brick55: s05-stg:/gluster/mnt7/brick
> Brick56: s05-stg:/gluster/mnt8/brick
> Brick57: s05-stg:/gluster/mnt9/brick
> Brick58: s05-stg:/gluster/mnt10/brick
> Brick59: s05-stg:/gluster/mnt11/brick
> Brick60: s05-stg:/gluster/mnt12/brick
>
> V5
> Brick61: s06-stg:/gluster/mnt1/brick
> Brick62: s06-stg:/gluster/mnt2/brick
> Brick63: s06-stg:/gluster/mnt3/brick
> Brick64: s06-stg:/gluster/mnt4/brick
> Brick65: s06-stg:/gluster/mnt5/brick
> Brick66: s06-stg:/gluster/mnt6/brick
>
> V6
> Brick67: s06-stg:/gluster/mnt7/brick
> Brick68: s06-stg:/gluster/mnt8/brick
> Brick69: s06-stg:/gluster/mnt9/brick
> Brick70: s06-stg:/gluster/mnt10/brick
> Brick71: s06-stg:/gluster/mnt11/brick
> Brick72: s06-stg:/gluster/mnt12/brick
>
>
> Just a note that these steps need movement of data.
> Be careful while performing these steps and do one replace brick at a time
> and only after heal completion go to next.
> Let me know if you have any issues.
>
> ---
> Ashish
>
>
>
> ------------------------------
> *From: *"Mauro Tridici" <mauro.tridici at cmcc.it>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Thursday, September 27, 2018 4:03:04 PM
> *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
> volume        based on 3.12.14 version
>
>
> Dear Ashish,
>
> I hope I don’t disturb you so much, but I would like to ask you if you had
> some time to dedicate to our problem.
> Please, forgive my insistence.
>
> Thank you in advance,
> Mauro
>
> Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici <
> mauro.tridici at cmcc.it> ha scritto:
>
> Hi Ashish,
>
> sure, no problem! We are a little bit worried, but we can wait  :-)
> Thank you very much for your support and your availability.
>
> Regards,
> Mauro
>
>
> Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey <aspandey at redhat.com>
> ha scritto:
>
> Hi Mauro,
>
> Yes, I can provide you step by step procedure to correct it.
> Is it fine If i provide you the steps tomorrow as it is quite late over
> here and I don't want to miss anything in hurry?
>
> ---
> Ashish
>
> ------------------------------
> *From: *"Mauro Tridici" <mauro.tridici at cmcc.it>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Wednesday, September 26, 2018 6:54:19 PM
> *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
> volume        based on 3.12.14 version
>
>
> Hi Ashish,
>
> in attachment you can find the rebalance log file and the last updated
> brick log file (the other files in /var/log/glusterfs/bricks directory seem
> to be too old).
> I just stopped the running rebalance (as you can see at the bottom of the
> rebalance log file).
> So, if exists a safe procedure to correct the problem I would like execute
> it.
>
> I don’t know if I can ask you it, but, if it is possible, could you please
> describe me step by step the right procedure to remove the newly added
> bricks without losing the data that have been already rebalanced?
>
> The following outputs show the result of “df -h” command executed on one
> of the first 3 nodes (s01, s02, s03) already existing  and on one of the
> last 3 nodes (s04, s05, s06) added recently.
>
> [root at s06 bricks]# df -h
> File system                          Dim. Usati Dispon. Uso% Montato su
> /dev/mapper/cl_s06-root              100G  2,1G     98G   3% /
> devtmpfs                              32G     0     32G   0% /dev
> tmpfs                                 32G  4,0K     32G   1% /dev/shm
> tmpfs                                 32G   26M     32G   1% /run
> tmpfs                                 32G     0     32G   0% /sys/fs/cgroup
> /dev/mapper/cl_s06-var               100G  2,0G     99G   2% /var
> /dev/mapper/cl_s06-gluster           100G   33M    100G   1% /gluster
> /dev/sda1                           1014M  152M    863M  15% /boot
> /dev/mapper/gluster_vgd-gluster_lvd  9,0T  807G    8,3T   9% /gluster/mnt3
> /dev/mapper/gluster_vgg-gluster_lvg  9,0T  807G    8,3T   9% /gluster/mnt6
> /dev/mapper/gluster_vgc-gluster_lvc  9,0T  807G    8,3T   9% /gluster/mnt2
> /dev/mapper/gluster_vge-gluster_lve  9,0T  807G    8,3T   9% /gluster/mnt4
> /dev/mapper/gluster_vgj-gluster_lvj  9,0T  887G    8,2T  10% /gluster/mnt9
> /dev/mapper/gluster_vgb-gluster_lvb  9,0T  807G    8,3T   9% /gluster/mnt1
> /dev/mapper/gluster_vgh-gluster_lvh  9,0T  887G    8,2T  10% /gluster/mnt7
> /dev/mapper/gluster_vgf-gluster_lvf  9,0T  807G    8,3T   9% /gluster/mnt5
> /dev/mapper/gluster_vgi-gluster_lvi  9,0T  887G    8,2T  10% /gluster/mnt8
> /dev/mapper/gluster_vgl-gluster_lvl  9,0T  887G    8,2T  10% /gluster/mnt11
> /dev/mapper/gluster_vgk-gluster_lvk  9,0T  887G    8,2T  10% /gluster/mnt10
> /dev/mapper/gluster_vgm-gluster_lvm  9,0T  887G    8,2T  10% /gluster/mnt12
> tmpfs                                6,3G     0    6,3G   0% /run/user/0
>
> [root at s01 ~]# df -h
> File system                          Dim. Usati Dispon. Uso% Montato su
> /dev/mapper/cl_s01-root              100G  5,3G     95G   6% /
> devtmpfs                              32G     0     32G   0% /dev
> tmpfs                                 32G   39M     32G   1% /dev/shm
> tmpfs                                 32G   26M     32G   1% /run
> tmpfs                                 32G     0     32G   0% /sys/fs/cgroup
> /dev/mapper/cl_s01-var               100G   11G     90G  11% /var
> /dev/md127                          1015M  151M    865M  15% /boot
> /dev/mapper/cl_s01-gluster           100G   33M    100G   1% /gluster
> /dev/mapper/gluster_vgi-gluster_lvi  9,0T  5,5T    3,6T  61% /gluster/mnt7
> /dev/mapper/gluster_vgm-gluster_lvm  9,0T  5,4T    3,6T  61% /gluster/mnt11
> /dev/mapper/gluster_vgf-gluster_lvf  9,0T  5,7T    3,4T  63% /gluster/mnt4
> /dev/mapper/gluster_vgl-gluster_lvl  9,0T  5,8T    3,3T  64% /gluster/mnt10
> /dev/mapper/gluster_vgj-gluster_lvj  9,0T  5,5T    3,6T  61% /gluster/mnt8
> /dev/mapper/gluster_vgn-gluster_lvn  9,0T  5,4T    3,6T  61% /gluster/mnt12
> /dev/mapper/gluster_vgk-gluster_lvk  9,0T  5,8T    3,3T  64% /gluster/mnt9
> /dev/mapper/gluster_vgh-gluster_lvh  9,0T  5,6T    3,5T  63% /gluster/mnt6
> /dev/mapper/gluster_vgg-gluster_lvg  9,0T  5,6T    3,5T  63% /gluster/mnt5
> /dev/mapper/gluster_vge-gluster_lve  9,0T  5,7T    3,4T  63% /gluster/mnt3
> /dev/mapper/gluster_vgc-gluster_lvc  9,0T  5,6T    3,5T  62% /gluster/mnt1
> /dev/mapper/gluster_vgd-gluster_lvd  9,0T  5,6T    3,5T  62% /gluster/mnt2
> tmpfs                                6,3G     0    6,3G   0% /run/user/0
> s01-stg:tier2                        420T  159T    262T  38% /tier2
>
> As you can see, used space value of each brick of the last servers is
> about 800GB.
>
> Thank you,
> Mauro
>
>
>
>
>
>
>
>
> Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey <aspandey at redhat.com>
> ha scritto:
>
> Hi Mauro,
>
> rebalance and brick logs should be the first thing we should go through.
>
> There is a procedure to correct the configuration/setup but the situation
> you are in is difficult to follow that procedure.
> You should have added the bricks hosted on s04-stg, s05-stg and s06-stg
> the same way you had the previous configuration.
> That means 2 bricks on each node for one subvolume.
> The procedure will require a lot of replace bricks which will again need
> healing and all. In addition to that we have to wait for re-balance to
> complete.
>
> I would suggest that if whole data has not been rebalanced and if you can
> stop the rebalance and remove these newly added bricks properly then you
> should remove these newly added bricks.
> After that, add these bricks so that you have 2 bricks of each volume on 3
> newly added nodes.
>
> Yes, it is like undoing whole effort but it is better to do it now then
> facing issues in future when it will be almost impossible to correct these
> things if you have lots of data.
>
> ---
> Ashish
>
>
>
> ------------------------------
> *From: *"Mauro Tridici" <mauro.tridici at cmcc.it>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Wednesday, September 26, 2018 5:55:02 PM
> *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse
> volume        based on 3.12.14 version
>
>
> Dear Ashish,
>
> thank you for you answer.
> I could provide you the entire log file related to glusterd, glusterfsd
> and rebalance.
> Please, could you indicate which one you need first?
>
> Yes, we added the last 36 bricks after creating vol. Is there a procedure
> to correct this error? Is it still possible to do it?
>
> Many thanks,
> Mauro
>
> Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey <aspandey at redhat.com>
> ha scritto:
>
>
> I think we don't have enough logs to debug this so I would suggest you to
> provide more logs/info.
> I have also observed that the configuration and setup of your volume is
> not very efficient.
>
> For example:
> Brick37: s04-stg:/gluster/mnt1/brick
> Brick38: s04-stg:/gluster/mnt2/brick
> Brick39: s04-stg:/gluster/mnt3/brick
> Brick40: s04-stg:/gluster/mnt4/brick
> Brick41: s04-stg:/gluster/mnt5/brick
> Brick42: s04-stg:/gluster/mnt6/brick
> Brick43: s04-stg:/gluster/mnt7/brick
> Brick44: s04-stg:/gluster/mnt8/brick
> Brick45: s04-stg:/gluster/mnt9/brick
> Brick46: s04-stg:/gluster/mnt10/brick
> Brick47: s04-stg:/gluster/mnt11/brick
> Brick48: s04-stg:/gluster/mnt12/brick
>
> These 12 bricks are on same node and the sub volume made up of these
> bricks will be of same subvolume, which is not good. Same is true for the
> bricks hosted on s05-stg and s06-stg
> I think you have added these bricks after creating vol. The probability of
> disruption in connection of these bricks will be higher in this case.
>
> ---
> Ashish
>
> ------------------------------
> *From: *"Mauro Tridici" <mauro.tridici at cmcc.it>
> *To: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Wednesday, September 26, 2018 3:38:35 PM
> *Subject: *[Gluster-users] Rebalance failed on Distributed Disperse
> volume        based on 3.12.14 version
>
> Dear All, Dear Nithya,
>
> after upgrading from 3.10.5 version to 3.12.14, I tried to start a
> rebalance process to distribute data across the bricks, but something goes
> wrong.
> Rebalance failed on different nodes and the time value needed to complete
> the procedure seems to be very high.
>
> [root at s01 ~]# gluster volume rebalance tier2 status
>                                     Node Rebalanced-files          size
>     scanned      failures       skipped               status  run time in
> h:m:s
>                                ---------
>   -----------   -----------   -----------   -----------   -----------
>   ------------     --------------
>                                localhost               19       161.6GB
>         537             2             2          in progress        0:32:23
>                                  s02-stg               25       212.7GB
>         526             5             2          in progress        0:32:25
>                                  s03-stg                4        69.1GB
>         511             0             0          in progress        0:32:25
>                                  s04-stg                4      484Bytes
>       12283             0             3          in progress        0:32:25
>                                  s05-stg               23      484Bytes
>       11049             0            10          in progress        0:32:25
>                                  s06-stg                3         1.2GB
>       8032            11             3               failed        0:17:57
> Estimated time left for rebalance to complete :     3601:05:41
> volume rebalance: tier2: success
>
> When rebalance processes fail, I can see the following kind of errors in
> /var/log/glusterfs/tier2-rebalance.log
>
> Error type 1)
>
> [2018-09-26 08:50:19.872575] W [MSGID: 122053]
> [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on
> 2 of 6 subvolumes.(up=111111, mask=100111, remaining=
> 000000, good=100111, bad=011000)
> [2018-09-26 08:50:19.901792] W [MSGID: 122053]
> [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on
> 1 of 6 subvolumes.(up=111111, mask=111101, remaining=
> 000000, good=111101, bad=000010)
>
> Error type 2)
>
> [2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv]
> 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset
> by peer)
>
> Error type 3)
>
> [2018-09-26 08:57:37.852590] W [MSGID: 122035]
> [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation
> with some subvolumes unavailable (10)
> [2018-09-26 08:57:39.282306] W [MSGID: 122035]
> [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation
> with some subvolumes unavailable (10)
> [2018-09-26 09:02:04.928408] W [MSGID: 109023]
> [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of
> file {blocks:0 name:(/OPA/archive/historical/dts/MRE
> A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result
> in dst node (tier2-disperse-5:2440190848) having lower disk space than the
> source node (tier2-dispers
> e-11:71373083776).Skipping file.
>
> Error type 4)
>
> W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket
> disconnected
>
> Error type 5)
>
> [2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55
> 90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b]
> ) 0-: received signum (15), shutting down
>
> Error type 6)
>
> [2018-09-25 08:09:18.340658] C
> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server
> 192.168.0.52:49153 has not responded in the last 42 seconds,
> disconnecting.
>
> It seems that there are some network or timeout problems, but the network
> usage/traffic values are not so high.
> Do you think that, in my volume configuration, I have to modify some
> volume options related to thread and/or network parameters?
> Could you, please, help me to understand the cause of the problems above?
>
> You can find below our volume info:
> (volume is implemented on 6 servers; each server configuration:  2 cpu
> 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD)
>
> [root at s04 ~]# gluster vol info
>
> <div style="margin: 0px; line-height: nor
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180928/d64a8298/attachment-0001.html>