[Gluster-users] Rebalancing newly added bricks

Nithya Balachandran nbalacha at redhat.com
Wed Sep 18 08:10:51 UTC 2019


On Sat, 14 Sep 2019 at 01:25, Herb Burnswell <herbert.burnswell at gmail.com>
wrote:

> Hi,
>
> Well our rebalance seems to have failed.  Here is the output:
>

Hi,

Rebalance will abort itself if it cannot reach any of the nodes. Are all
the bricks still up and reachable?

Regards,
Nithya




>
> # gluster vol rebalance tank status
>                                     Node Rebalanced-files          size
>     scanned      failures       skipped               status  run time in
> h:m:s
>                                ---------      -----------   -----------
> -----------   -----------   -----------         ------------
> --------------
>                                localhost          1348706        57.8TB
>     2234439             9             6               failed      190:24:3
>                                serverB                         0
>  0Bytes             7             0             0            completed
>   63:47:55
> volume rebalance: tank: success
>
> # gluster vol status tank
> Status of volume: tank
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
> ------------------------------------------------------------------------------
> Brick serverA:/gluster_bricks/data1       49162     0          Y
> 20318
> Brick serverB:/gluster_bricks/data1       49166     0          Y
> 3432
> Brick serverA:/gluster_bricks/data2       49163     0          Y
> 20323
> Brick serverB:/gluster_bricks/data2       49167     0          Y
> 3435
> Brick serverA:/gluster_bricks/data3       49164     0          Y
> 4625
> Brick serverA:/gluster_bricks/data4       49165     0          Y
> 4644
> Brick serverA:/gluster_bricks/data5       49166     0          Y
> 5088
> Brick serverA:/gluster_bricks/data6       49167     0          Y
> 5128
> Brick serverB:/gluster_bricks/data3       49168     0          Y
> 22314
> Brick serverB:/gluster_bricks/data4       49169     0          Y
> 22345
> Brick serverB:/gluster_bricks/data5       49170     0          Y
> 22889
> Brick serverB:/gluster_bricks/data6       49171     0          Y
> 22932
> Self-heal Daemon on localhost               N/A       N/A        Y
> 6202
> Self-heal Daemon on serverB               N/A       N/A        Y
> 22981
>
> Task Status of Volume tank
>
> ------------------------------------------------------------------------------
> Task                 : Rebalance
> ID                   : eec64343-8e0d-4523-ad05-5678f9eb9eb2
> Status               : failed
>
> # df -hP |grep data
> /dev/mapper/gluster_vg-gluster_lv1_data   60T   31T   29T  52%
> /gluster_bricks/data1
> /dev/mapper/gluster_vg-gluster_lv2_data   60T   31T   29T  51%
> /gluster_bricks/data2
> /dev/mapper/gluster_vg-gluster_lv3_data   60T   15T   46T  24%
> /gluster_bricks/data3
> /dev/mapper/gluster_vg-gluster_lv4_data   60T   15T   46T  24%
> /gluster_bricks/data4
> /dev/mapper/gluster_vg-gluster_lv5_data   60T   15T   45T  25%
> /gluster_bricks/data5
> /dev/mapper/gluster_vg-gluster_lv6_data   60T   15T   45T  25%
> /gluster_bricks/data6
>
>
> The rebalance log on serverA shows a disconnect from serverB
>
> [2019-09-08 15:41:44.285591] C
> [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-tank-client-10: server
> <serverB>:49170 has not responded in the last 42 seconds, disconnecting.
> [2019-09-08 15:41:44.285739] I [MSGID: 114018]
> [client.c:2280:client_rpc_notify] 0-tank-client-10: disconnected from
> tank-client-10. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2019-09-08 15:41:44.286023] E [rpc-clnt.c:365:saved_frames_unwind] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff986e8b132] (-->
> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff986c5299e] (-->
> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7ff986c52aae] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7ff986c54220] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2b0)[0x7ff986c54ce0] )))))
> 0-tank-client-10: forced unwinding frame type(GlusterFS 3.3)
> op(FXATTROP(34)) called at 2019-09-08 15:40:44.040333 (xid=0x7f8cfac)
>
> Does this type of failure cause data corruption?  What is the best course
> of action at this point?
>
> Thanks,
>
> HB
>
> On Wed, Sep 11, 2019 at 11:58 PM Strahil <hunter86_bg at yahoo.com> wrote:
>
>> Hi Nithya,
>>
>> Thanks for the detailed explanation.
>> It makes sense.
>>
>> Best Regards,
>> Strahil Nikolov
>> On Sep 12, 2019 08:18, Nithya Balachandran <nbalacha at redhat.com> wrote:
>>
>>
>>
>> On Wed, 11 Sep 2019 at 09:47, Strahil <hunter86_bg at yahoo.com> wrote:
>>
>> Hi Nithya,
>>
>> I just reminded about your previous  e-mail  which left me with the
>> impression that old volumes need that.
>> This is the one 1 mean:
>>
>> >It looks like this is a replicate volume. If >that is the case then yes,
>> you are >running an old version of Gluster for >which this was the default
>>
>>
>> Hi Strahil,
>>
>> I'm providing a little more detail here which I hope will explain things.
>> Rebalance was always a volume wide operation - a *rebalance start*
>> operation will start rebalance processes on all nodes of the volume.
>> However, different processes would behave differently. In earlier releases,
>> all nodes would crawl the bricks and update the directory layouts. However,
>> only one node in each replica/disperse set would actually migrate files,so
>> the rebalance status would only show one node doing any "work" (scanning,
>> rebalancing etc). However, this one node will process all the files in its
>> replica sets. Rerunning rebalance on other nodes would make no difference
>> as it will always be the same node that ends up migrating files.
>> So for instance, for a replicate volume with server1:/brick1,
>> server2:/brick2 and server3:/brick3 in that order, only the rebalance
>> process on server1 would migrate files. In newer releases, all 3 nodes
>> would migrate files.
>>
>> The rebalance status does not capture the directory operations of fixing
>> layouts which is why it looks like the other nodes are not doing anything.
>>
>> Hope this helps.
>>
>> Regards,
>> Nithya
>>
>> behaviour.
>>
>> >
>> >
>>
>> >Regards,
>>
>> >
>>
>> >Nithya
>>
>>
>> Best Regards,
>> Strahil Nikolov
>> On Sep 9, 2019 06:36, Nithya Balachandran <nbalacha at redhat.com> wrote:
>>
>>
>>
>> On Sat, 7 Sep 2019 at 00:03, Strahil Nikolov <hunter86_bg at yahoo.com>
>> wrote:
>>
>> As it was mentioned, you might have to run rebalance on the other node -
>> but it is better to wait this node is over.
>>
>>
>> Hi Strahil,
>>
>> Rebalance does not need to be run on the other node - the operation is a
>> volume wide one . Only a single node per replica set would migrate files in
>> the version used in this case .
>>
>> Regards,
>> Nithya
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> В петък, 6 септември 2019 г., 15:29:20 ч. Гринуич+3, Herb Burnswell <
>> herbert.burnswell at gmail.com>
>>
>> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190918/c46e08e9/attachment.html>


More information about the Gluster-users mailing list