[Bugs] [Bug 1541038] New: A down brick is incorrectly considered to be online and makes the volume to be started without any brick available

bugzilla at redhat.com bugzilla at redhat.com
Thu Feb 1 15:04:38 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1541038

            Bug ID: 1541038
           Summary: A down brick is incorrectly considered to be online
                    and makes the volume to be started without any brick
                    available
           Product: GlusterFS
           Version: mainline
         Component: replicate
          Assignee: bugs at gluster.org
          Reporter: jahernan at redhat.com
                CC: bugs at gluster.org



Description of problem:

In a replica 2 volume, if one of the bricks is down and it reports its state
before the online one, AFR tries to find another online brick in
find_best_down_child(). Since priv->child_up array has been initialized with -1
and this function only checks if it's 0, it considers that the other brick is
alive and sends a CHILD_UP notification.

At this point the other xlators start sending requests, which fail with
ENOTCONN when they reach afr. This can cause several unexpected errors.

Version-Release number of selected component (if applicable): mainline


How reproducible:

It happens randomly, depending on the order in which bricks are started.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list