[Gluster-devel] Rebalance failure wrt trashcan

Fri May 15 05:31:37 UTC 2015

On 05/14/2015 09:14 PM, Nithya Balachandran wrote:
> Hi Anoop,
>
> It is a specific use case. Please see http://review.gluster.org/#/c/10786/ for more details.
> The issue is not related to the trash translator.
>
> To hit the issue you would need to create a distrep vol such that the first brick of each replica set exists on one node and the second brick on the second node. ie.,
>
> gluster v create vol1 replica 2 <node1>:/path_to_brick1 <node2>:/path_to_brick1 <node1>:/path_to_brick2 <node2>:/path_to_brick1
>

Thanks for explaining the issue and your quick patch for fixing the same.

--Anoop C S

> Regards,
> Nithya
> ----- Anoop C S <achiraya at redhat.com> wrote:
>> Hi,
>>
>> I tried to reproduce the situation using master by adding some bricks
>> and initiating the rebalance operation(I created some empty files
>> through mount before adding the bricks). And I couldn't find any error
>> in volume status output or rebalance/brick logs.
>>
>> [root at dhcp43-4 master]# gluster v create vol 10.70.43.4:/home/brick1
>> 10.70.43.66:/home/brick2 force
>> volume create: vol: success: please start the volume to access data
>> [root at dhcp43-4 master]# gluster v start vol
>> volume start: vol: success
>> [root at dhcp43-4 master]# gluster v add-brick vol 10.70.43.66:/home/brick3
>> 10.70.43.66:/home/brick4 force
>> volume add-brick: success
>> [root at dhcp43-4 master]# gluster v rebalance vol start
>> volume rebalance: vol: success: Rebalance on vol has been started
>> successfully. Use rebalance status command to check status of the
>> rebalance process.
>> ID: f4f86e5e-e042-424b-a155-687b88cd6d26
>>
>> [root at dhcp43-4 master]# gluster v rebalance vol status
>>                                       Node Rebalanced-files          size
>>         scanned      failures       skipped               status   run
>> time in secs
>>                                  ---------      -----------   -----------
>>     -----------   -----------   -----------         ------------
>> --------------
>>                                  localhost                0        0Bytes
>>               5             0             1            completed       0.00
>>                                10.70.43.66                0        0Bytes
>>               6             0             2            completed       0.00
>> volume rebalance: vol: success:
>> [root at dhcp43-4 master]# gluster v status vol
>> Status of volume: vol
>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>> ------------------------------------------------------------------------------
>> Brick 10.70.43.4:/home/brick1               49152     0          Y  6983
>> Brick 10.70.43.66:/home/brick2              49152     0          Y  12853
>> Brick 10.70.43.66:/home/brick3              49153     0          Y  12888
>> Brick 10.70.43.66:/home/brick4              49154     0          Y  12905
>> NFS Server on localhost                     2049      0          Y  7027
>> NFS Server on 10.70.43.66                   2049      0          Y  12923
>>
>> Task Status of Volume vol
>> ------------------------------------------------------------------------------
>> Task                 : Rebalance
>> ID                   : f4f86e5e-e042-424b-a155-687b88cd6d26
>> Status               : completed
>>
>> However I could see the following in rebalance logs.
>>
>> [2015-05-14 11:40:14.474644] I [dht-layout.c:697:dht_layout_normalize]
>> 0-vol-dht: Found anomalies in /.trashcan (gfid = 00000000-0000-00
>> 00-0000-000000000005). Holes=1 overlaps=0
>>
>> [2015-05-14 11:40:14.485028] I [MSGID: 109036]
>> [dht-common.c:6690:dht_log_new_layout_for_dir_selfheal] 0-vol-dht:
>> Setting layout of /.trashcan with [Subvol_name: vol-client-0, Err: -1 ,
>> Start: 0 , Stop: 1073737911 , Hash: 1 ], [Subvol_name: vol-client-1,
>> Err: -1 , Start: 1073737912 , Stop: 2147475823 , Hash: 1 ],
>> [Subvol_name: vol-client-2, Err: -1 , Start: 2147475824 , Stop:
>> 3221213735 , Hash: 1 ], [Subvol_name: vol-client-3, Err: -1 , Start:
>> 3221213736 , Stop: 4294967295 , Hash: 1 ],
>>
>> [2015-05-14 11:40:14.485958] I [dht-common.c:3539:dht_setxattr]
>> 0-vol-dht: fixing the layout of /.trashcan
>>
>> . . .
>>
>> [2015-05-14 11:40:14.488222] I
>> [dht-rebalance.c:2113:gf_defrag_process_dir] 0-vol-dht: migrate data
>> called on /.trashcan
>>
>> [2015-05-14 11:40:14.488966] I
>> [dht-rebalance.c:2322:gf_defrag_process_dir] 0-vol-dht: Migration
>> operation on dir /.trashcan took 0.00 secs
>>
>> [2015-05-14 11:40:14.494033] I [dht-layout.c:697:dht_layout_normalize]
>> 0-vol-dht: Found anomalies in /.trashcan/internal_op (gfid =
>> 00000000-0000-0000-0000-000000000006). Holes=1 overlaps=0
>>
>> [2015-05-14 11:40:14.495608] I [MSGID: 109036]
>> [dht-common.c:6690:dht_log_new_layout_for_dir_selfheal] 0-vol-dht:
>> Setting layout of /.trashcan/internal_op with [Subvol_name:
>> vol-client-0, Err: -1 , Start: 2147475824 , Stop: 3221213735 , Hash: 1
>> ], [Subvol_name: vol-client-1, Err: -1 , Start: 3221213736 , Stop:
>> 4294967295 , Hash: 1 ], [Subvol_name: vol-client-2, Err: -1 , Start: 0 ,
>> Stop: 1073737911 , Hash: 1 ], [Subvol_name: vol-client-3, Err: -1 ,
>> Start: 1073737912 , Stop: 2147475823 , Hash: 1 ],
>>
>> [2015-05-14 11:40:14.501198] I [dht-common.c:3539:dht_setxattr]
>> 0-vol-dht: fixing the layout of /.trashcan/internal_op
>>
>> . . .
>>
>> [2015-05-14 11:40:14.508264] I
>> [dht-rebalance.c:2113:gf_defrag_process_dir] 0-vol-dht: migrate data
>> called on /.trashcan/internal_op
>>
>> [2015-05-14 11:40:14.509493] I
>> [dht-rebalance.c:2322:gf_defrag_process_dir] 0-vol-dht: Migration
>> operation on dir /.trashcan/internal_op took 0.00 secs
>>
>> [2015-05-14 11:40:14.513020] I [dht-common.c:3539:dht_setxattr]
>> 0-vol-dht: fixing the layout of /.trashcan/internal_op
>>
>> [2015-05-14 11:40:14.525227] I [dht-common.c:3539:dht_setxattr]
>> 0-vol-dht: fixing the layout of /.trashcan
>>
>> . . .
>>
>> [2015-05-14 11:40:14.529157] I
>> [dht-rebalance.c:2793:gf_defrag_start_crawl] 0-DHT: crawling file-system
>> completed
>>
>>
>> On 05/14/2015 04:20 PM, SATHEESARAN wrote:
>>> On 05/14/2015 12:55 PM, Vijay Bellur wrote:
>>>> On 05/14/2015 09:00 AM, SATHEESARAN wrote:
>>>>> Hi All,
>>>>>
>>>>> I was using glusterfs-3.7 beta2 build (
>>>>> glusterfs-3.7.0beta2-0.0.el6.x86_64 )
>>>>> I have seen rebalance failure in one of the node.
>>>>>
>>>>> [2015-05-14 12:17:03.695156] E
>>>>> [dht-rebalance.c:2368:gf_defrag_settle_hash] 0-vmstore-dht: fix layout
>>>>> on /.trashcan/internal_op failed
>>>>> [2015-05-14 12:17:03.695636] E [MSGID: 109016]
>>>>> [dht-rebalance.c:2528:gf_defrag_fix_layout] 0-vmstore-dht: Fix layout
>>>>> failed for /.trashcan
>>>>>
>>>>> Does it have any impact ?
>>>>>
>>>>
>>>> I don't think there should be any impact due to this. rebalance should
>>>> continue fine without any problems. Do let us know if you observe the
>>>> behavior to be otherwise.
>>>>
>>>> -Vijay
>>> I tested the same functionally and I don't find any impact as such, but
>>> the 'gluster volume status <vol-name>' reports the rebalance as a FAILURE.
>>> Any tool ( for example oVirt ), consuming the output from 'gluster
>>> volume status <vol> --xml'  would report the rebalance operation as FAILURE
>>>
>>> [root@ ~]# gluster volume rebalance vmstore start
>>> volume rebalance: vmstore: success: Rebalance on vmstore has been
>>> started successfully. Use rebalance status command to check status of
>>> the rebalance process.
>>> ID: 68a12fc9-acd5-4f24-ba2d-bfc070ad5668
>>>
>>> [root@~]# gluster volume rebalance vmstore status
>>>                                       Node Rebalanced-files size
>>> scanned      failures       skipped status   run time in secs
>>>                                  ---------      ----------- -----------
>>> -----------   -----------   ----------- ------------     --------------
>>>                                  localhost                0
>>> 0Bytes             2             0             0 completed 0.00
>>>                                10.70.37.58                0
>>> 0Bytes             0             3             0 failed               0.00
>>> volume rebalance: vmstore: success:
>>>
>>> [root@~]# gluster volume status vmstore
>>> Status of volume: vmstore
>>> Gluster process                             TCP Port  RDMA Port Online  Pid
>>> ------------------------------------------------------------------------------
>>>
>>> ......
>>>
>>> Task Status of Volume vmstore
>>> ------------------------------------------------------------------------------
>>>
>>> Task                 : Rebalance
>>> ID                   : 68a12fc9-acd5-4f24-ba2d-bfc070ad5668
>>> Status               : failed
>>>
>>> Snip from --xml tasks :
>>> <tasks>
>>>             <task>
>>>               <type>Rebalance</type>
>>> <id>68a12fc9-acd5-4f24-ba2d-bfc070ad5668</id>
>>>               <status>4</status>
>>>               <statusStr>failed</statusStr>
>>>             </task>
>>> </tasks>
>>>
>>> Even this is the case with remove-brick with data migration too
>>>
>>> -- sas
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>