[Bugs] [Bug 1406411] Add-brick command fails when one of the replica brick is down

Mon Dec 26 09:48:23 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1406411

--- Comment #5 from Atin Mukherjee <amukherj at redhat.com> ---
(In reply to Mohit Agrawal from comment #4)
> Hi,
> 
> It seems it is expected behavior.As per current dht code in first attempt
> layout sets only when all subvolumes 
> are up otherwise it will not set layout and throws error.

At worst case, we'd need to have a validation in GlusterD to block users end up
into this situation otherwise GlusterD will end up into an inconsistent state
where in one of the nodes the commit will fail where as in the others it will
go through and the transaction will not be roll backed due to the limitation of
GlusterD's design.

> 
> Below is the case of plain distributed environment when i have killed one
> brick after start volume then mount is failing as per current dht behavior
> 
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> 
> [root at dhcp10-210 ~]# systemctl restart glusterd.service
> [root at dhcp10-210 ~]# gluster v create test 10.65.7.254:/dist1/brick1
> 10.65.7.254:/dist2/brick2
> volume create: test: success: please start the volume to access data
> [root at dhcp10-210 ~]# gluster v start test
> volume start: test: success
> [root at dhcp10-210 ~]# gluster v status
> Status of volume: test
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.65.7.254:/dist1/brick1             49152     0          Y      
> 11117
> Brick 10.65.7.254:/dist2/brick2             49153     0          Y      
> 11136
>  
> Task Status of Volume test
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> [root at dhcp10-210 ~]# kill 11136
> [root at dhcp10-210 ~]# mount -t glusterfs 10.65.7.254:/test /mnt
> Mount failed. Please check the log file for more detail
> 
> [2016-12-26 06:11:14.871167] W [MSGID: 109005]
> [dht-selfheal.c:2102:dht_selfheal_directory] 0-test-dht: Directory selfheal
> failed: 1 subvolumes down.Not fixing. path = /, gfid = 
> [2016-12-26 06:11:14.880232] W [fuse-bridge.c:767:fuse_attr_cbk]
> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Stale file handle)
> 
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> 
> As of now we think it is a corner case , it would be difficult to provide a
> fix unless there is any data loss in this case.
> 
> 
> Regards
> Mohit Agrawal

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=MK1InJLqF1&a=cc_unsubscribe