Hi Friends,

Please note that, when replace-brick operation was tried for one of the bad brick present in distributed disperse EC volume, the command actually failed but the brick daemon of new replaced brick came online.
Please help to understand in what situations this issue may arise and proposed solution if possible ? :

glusterd.log  :

[2018-12-11 11:04:43.774120] I [MSGID: 106503] [glusterd-replace-brick.c:147:__glusterd_handle_replace_brick] 0-management: Received replace-brick commit force request.

[2018-12-11 11:04:44.784578] I [MSGID: 106504] [glusterd-utils.c:13079:rb_update_dstbrick_port] 0-glusterd: adding dst-brick port no 0


[2018-12-11 11:04:46.457537] E [MSGID: 106029] [glusterd-utils.c:7981:glusterd_brick_signal] 0-glusterd: Unable to open pidfile: /var/run/gluster/vols/AM6_HyperScale/am6sv0004sds.saipemnet.saipem.intranet-ws-disk3-ws_brick.pid [No such file or directory]

[2018-12-11 11:04:53.089810] I [glusterd-utils.c:5876:glusterd_brick_start] 0-management: starting a fresh brick process for brick /ws/disk15/ws_brick


[2018-12-11 11:04:53.117935] W [socket.c:595:__socket_rwv] 0-socket.management: writev on failed (Broken pipe)

[2018-12-11 11:04:54.014023] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now

[2018-12-11 11:04:54.273190] I [MSGID: 106005] [glusterd-handler.c:6120:__glusterd_brick_rpc_notify] 0-management: Brick am6sv0004sds.saipemnet.saipem.intranet:/ws/disk15/ws_brick has disconnected from glusterd.

[2018-12-11 11:04:54.297603] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit failed on am6sv0006sds.saipemnet.saipem.intranet. Please check log file for details.

[2018-12-11 11:04:54.350666] I [MSGID: 106143] [glusterd-pmap.c:278:pmap_registry_bind] 0-pmap: adding brick /ws/disk15/ws_brick on port 49164

[2018-12-11 11:05:01.137449] E [MSGID: 106123] [glusterd-mgmt.c:1519:glusterd_mgmt_v3_commit] 0-management: Commit failed on peers

[2018-12-11 11:05:01.137496] E [MSGID: 106123] [glusterd-replace-brick.c:660:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Commit Op Failed

[2018-12-11 11:06:12.275867] I [MSGID: 106499] [glusterd-handler.c:4370:__glusterd_handle_status_volume] 0-management: Received status volume req for volume AM6_HyperScale

[2018-12-11 13:35:51.529365] I [MSGID: 106499] [glusterd-handler.c:4370:__glusterd_handle_status_volume] 0-management: Received status volume req for volume AM6_HyperScale

gluster volume replace-brick AM6_HyperScale am6sv0004sds.saipemnet.saipem.intranet:/ws/disk3/ws_brick am6sv0004sds.saipemnet.saipem.intranet:/ws/disk15/ws_brick commit force
Replace brick failure, brick [/ws/disk3], volume [AM6_HyperScale]

"gluster volume status" now shows a new disk active /ws/disk15

The replacement appears to be successful, looks like healing started

[cid:image001.png at 01D49944.02DC60D0]

