[Gluster-users] Dispersed volumes won't heal on ARM

Sun Mar 1 07:02:27 UTC 2020

On March 1, 2020 6:08:31 AM GMT+02:00, Fox <foxxz.net at gmail.com> wrote:
>I am using a dozen odriod HC2 ARM systems each with a single HD/brick.
>Running ubuntu 18 and glusterfs 7.2 installed from the gluster PPA.
>
>I can create a dispersed volume and use it. If one of the cluster
>members
>duck out, say gluster12 reboots, when it comes back online it shows
>connected in the peer list but using
>gluster volume heal <volname> info summary
>
>It shows up as
>Brick gluster12:/exports/sda/brick1/disp1
>Status: Transport endpoint is not connected
>Total Number of entries: -
>Number of entries in heal pending: -
>Number of entries in split-brain: -
>Number of entries possibly healing: -
>
>Trying to force a full heal doesn't fix it. The cluster member
>otherwise
>works and heals for other non-disperse volumes even while showing up as
>disconnected for the dispersed volume.
>
>I have attached a terminal log of the volume creation and diagnostic
>output. Could this be an ARM specific problem?
>
>I tested a similar setup on x86 virtual machines. They were able to
>heal a
>dispersed volume no problem. One thing I see in the ARM logs I don't
>see in
>the x86 logs is lots of this..
>[2020-03-01 03:54:45.856769] W [MSGID: 122035]
>[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing
>operation
>with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on
>'(null)'
>with gfid 0d3c4cf3-e09c-4b9a-87d3-cdfc4f49b692
>[2020-03-01 03:54:45.910203] W [MSGID: 122035]
>[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing
>operation
>with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on
>'(null)'
>with gfid 0d806805-81e4-47ee-a331-1808b34949bf
>[2020-03-01 03:54:45.932734] I [rpc-clnt.c:1963:rpc_clnt_reconfig]
>0-disp1-client-11: changing port to 49152 (from 0)
>[2020-03-01 03:54:45.956803] W [MSGID: 122035]
>[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing
>operation
>with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on
>'(null)'
>with gfid d5768bad-7409-40f4-af98-4aef391d7ae4
>[2020-03-01 03:54:46.000102] W [MSGID: 122035]
>[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing
>operation
>with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on
>'(null)'
>with gfid 216f5583-e1b4-49cf-bef9-8cd34617beaf
>[2020-03-01 03:54:46.044184] W [MSGID: 122035]
>[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing
>operation
>with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on
>'(null)'
>with gfid 1b610b49-2d69-4ee6-a440-5d3edd6693d1

Hi,

Are you sure that the gluster bricks on this node is up and running ?
What is the output of 'gluster volume status' on this system ?

Best Regards,
Strahil Nikolov