[Bugs] [Bug 1541038] New: A down brick is incorrectly considered to be online and makes the volume to be started without any brick available
    bugzilla at redhat.com 
    bugzilla at redhat.com
       
    Thu Feb  1 15:04:38 UTC 2018
    
    
  
https://bugzilla.redhat.com/show_bug.cgi?id=1541038
            Bug ID: 1541038
           Summary: A down brick is incorrectly considered to be online
                    and makes the volume to be started without any brick
                    available
           Product: GlusterFS
           Version: mainline
         Component: replicate
          Assignee: bugs at gluster.org
          Reporter: jahernan at redhat.com
                CC: bugs at gluster.org
Description of problem:
In a replica 2 volume, if one of the bricks is down and it reports its state
before the online one, AFR tries to find another online brick in
find_best_down_child(). Since priv->child_up array has been initialized with -1
and this function only checks if it's 0, it considers that the other brick is
alive and sends a CHILD_UP notification.
At this point the other xlators start sending requests, which fail with
ENOTCONN when they reach afr. This can cause several unexpected errors.
Version-Release number of selected component (if applicable): mainline
How reproducible:
It happens randomly, depending on the order in which bricks are started.
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
    
    
More information about the Bugs
mailing list