[Gluster-users] Brick process not starting after reinstall

Richard Neuboeck hawk at tbi.univie.ac.at
Wed Mar 21 07:05:12 UTC 2018


Hi all,

our systems have suffered a host failure in a replica three setup.
The host needed a complete reinstall. I followed the RH guide to
'replace a host with the same hostname'
(https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-replacing_hosts).

The machine has the same OS (CentOS 7). The new machine got a minor
version number newer gluster packages
(glusterfs-3.12.6-1.el7.x86_64) than the others
(glusterfs-3.12.5-2.el7.x86_64).

The guide told me to create /var/lib/glusterd/glusterd.info with the
UUID from the old host.
Then I copied /var/lib/glusterd/peers/<uuid> files from the two
other hosts to the new (except the uuid file from the old host).
I created all the brick directories as present on the other
machines. Empty of course. And I set the volume-id extended
attribute to the value retrieved from the running hosts.

On one of the old hosts I mounted each export, created and removed a
directory, set and removed an extended attribute as the guide
suggested to trigger self healing.

After that I started the gluster daemon (systemctl start glusterd
glusterfsd).

The new host list other peers as connected (and vice versa) but no
brick processes are started. So the replacement bricks are not in
use and no healing is done.

I checked the logs and searched online but couldn't find a reason
why the brick processes are not running or how to get them running.

Is there a way to get the brick processes started? (Preferably not
shutting down the other hosts since they are in use)
Does anyone have a different approach to replace a faulty host?

Thanks in advance!
Cheers
Richard



Here is the glusterd.log. I've seen the disconnect messages but no
reason why.

/var/log/glusterd.log
[2018-03-20 13:34:01.333423] I [MSGID: 100030]
[glusterfsd.c:2524:main] 0-/usr/sbin/glusterd: Started running
/usr/sbin/glusterd version 3.12.6 (args: /usr/sbin/glusterd -p
/var/run/glusterd.pid --log-level INFO)
[2018-03-20 13:34:01.339203] I [MSGID: 106478]
[glusterd.c:1423:init] 0-management: Maximum allowed open file
descriptors set to 65536
[2018-03-20 13:34:01.339243] I [MSGID: 106479]
[glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as
working directory
[2018-03-20 13:34:01.339256] I [MSGID: 106479]
[glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid
file working directory
[2018-03-20 13:34:01.343809] E
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport:
/usr/lib64/glusterfs/3.12.6/rpc-transport/rdma.so: cannot open
shared object file: No such file or directory
[2018-03-20 13:34:01.343836] W
[rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume
'rdma.management': transport-type 'rdma' is not valid or not found
on this machine
[2018-03-20 13:34:01.343847] W
[rpcsvc.c:1682:rpcsvc_create_listener] 0-rpc-service: cannot create
listener, initing the transport failed
[2018-03-20 13:34:01.343855] E [MSGID: 106243]
[glusterd.c:1769:init] 0-management: creation of 1 listeners failed,
continuing with succeeded transport
[2018-03-20 13:34:01.344594] I [MSGID: 106228]
[glusterd.c:499:glusterd_check_gsync_present] 0-glusterd:
geo-replication module not installed in the system [No such file or
directory]
[2018-03-20 13:34:01.344936] I [MSGID: 106513]
[glusterd-store.c:2241:glusterd_restore_op_version] 0-glusterd:
retrieved op-version: 31202
[2018-03-20 13:34:01.471227] I [MSGID: 106498]
[glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo]
0-management: connect returned 0
[2018-03-20 13:34:01.471297] I [MSGID: 106498]
[glusterd-handler.c:3603:glusterd_friend_add_from_peerinfo]
0-management: connect returned 0
[2018-03-20 13:34:01.471325] W [MSGID: 106062]
[glusterd-handler.c:3400:glusterd_transport_inet_options_build]
0-glusterd: Failed to get tcp-user-timeout
[2018-03-20 13:34:01.471351] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2018-03-20 13:34:01.471412] W [MSGID: 101002]
[options.c:995:xl_opt_validate] 0-management: option
'address-family' is deprecated, preferred is
'transport.address-family', continuing with correction
[2018-03-20 13:34:01.474137] W [MSGID: 106062]
[glusterd-handler.c:3400:glusterd_transport_inet_options_build]
0-glusterd: Failed to get tcp-user-timeout
[2018-03-20 13:34:01.474161] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2018-03-20 13:34:01.474238] W [MSGID: 101002]
[options.c:995:xl_opt_validate] 0-management: option
'address-family' is deprecated, preferred is
'transport.address-family', continuing with correction
[2018-03-20 13:34:01.476646] I [MSGID: 106544]
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
e4ed3102-9794-494b-af36-d767d8a72678
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option transport.listen-backlog 10
  7:     option rpc-auth-allow-insecure on
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:
+------------------------------------------------------------------------------+
[2018-03-20 13:34:01.476895] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started
thread with index 1
[2018-03-20 13:34:12.197917] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd:
Received ACC from uuid: 0acd0bff-c38f-4c49-82da-4112d22dfd2c, host:
borg-sphere-three, port: 0
[2018-03-20 13:34:12.198929] C [MSGID: 106003]
[glusterd-server-quorum.c:354:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume engine. Starting
local bricks.
[2018-03-20 13:34:12.199166] I
[glusterd-utils.c:5941:glusterd_brick_start] 0-management: starting
a fresh brick process for brick /srv/gluster_engine/brick
[2018-03-20 13:34:12.202498] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2018-03-20 13:34:12.208389] C [MSGID: 106003]
[glusterd-server-quorum.c:354:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume export. Starting
local bricks.
[2018-03-20 13:34:12.208622] I
[glusterd-utils.c:5941:glusterd_brick_start] 0-management: starting
a fresh brick process for brick /srv/gluster_export/brick
[2018-03-20 13:34:12.211426] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2018-03-20 13:34:12.216722] C [MSGID: 106003]
[glusterd-server-quorum.c:354:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume iso. Starting local
bricks.
[2018-03-20 13:34:12.216906] I
[glusterd-utils.c:5941:glusterd_brick_start] 0-management: starting
a fresh brick process for brick /srv/gluster_iso/brick
[2018-03-20 13:34:12.219439] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2018-03-20 13:34:12.224400] C [MSGID: 106003]
[glusterd-server-quorum.c:354:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume plexus. Starting
local bricks.
[2018-03-20 13:34:12.224555] I
[glusterd-utils.c:5941:glusterd_brick_start] 0-management: starting
a fresh brick process for brick /srv/gluster_plexus/brick
[2018-03-20 13:34:12.226902] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2018-03-20 13:34:12.231689] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-nfs: setting
frame-timeout to 600
[2018-03-20 13:34:12.231986] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
already stopped
[2018-03-20 13:34:12.232047] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs
service is stopped
[2018-03-20 13:34:12.232082] I [MSGID: 106600]
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management:
nfs/server.so xlator is not installed
[2018-03-20 13:34:12.232165] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-glustershd: setting
frame-timeout to 600
[2018-03-20 13:34:12.238970] I [MSGID: 106568]
[glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping
glustershd daemon running in pid: 3554
[2018-03-20 13:34:13.239224] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd
service is stopped
[2018-03-20 13:34:13.239365] I [MSGID: 106567]
[glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting
glustershd service
[2018-03-20 13:34:14.243040] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-quotad: setting
frame-timeout to 600
[2018-03-20 13:34:14.243817] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad
already stopped
[2018-03-20 13:34:14.243866] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad
service is stopped
[2018-03-20 13:34:14.243928] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-bitd: setting
frame-timeout to 600
[2018-03-20 13:34:14.244474] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd
already stopped
[2018-03-20 13:34:14.244514] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd
service is stopped
[2018-03-20 13:34:14.244589] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-scrub: setting
frame-timeout to 600
[2018-03-20 13:34:14.245123] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub
already stopped
[2018-03-20 13:34:14.245169] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub
service is stopped
[2018-03-20 13:34:14.260266] I
[glusterd-utils.c:5941:glusterd_brick_start] 0-management: starting
a fresh brick process for brick /srv/gluster_navaar/brick
[2018-03-20 13:34:14.263172] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2018-03-20 13:34:14.271938] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting
frame-timeout to 600
[2018-03-20 13:34:14.272146] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting
frame-timeout to 600
[2018-03-20 13:34:14.272366] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting
frame-timeout to 600
[2018-03-20 13:34:14.272562] I
[rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting
frame-timeout to 600
[2018-03-20 13:34:14.273000] I [MSGID: 106492]
[glusterd-handler.c:2718:__glusterd_handle_friend_update]
0-glusterd: Received friend update from uuid:
0acd0bff-c38f-4c49-82da-4112d22dfd2c
[2018-03-20 13:34:14.273770] I [MSGID: 106502]
[glusterd-handler.c:2763:__glusterd_handle_friend_update]
0-management: Received my uuid as Friend
[2018-03-20 13:34:14.273907] I [MSGID: 106493]
[glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management:
Received ACC from uuid: 0acd0bff-c38f-4c49-82da-4112d22dfd2c
[2018-03-20 13:34:14.277313] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2018-03-20 13:34:14.280409] I [MSGID: 106005]
[glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
Brick borg-sphere-two:/srv/gluster_engine/brick has disconnected
from glusterd.
[2018-03-20 13:34:14.283608] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2018-03-20 13:34:14.286608] I [MSGID: 106005]
[glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
Brick borg-sphere-two:/srv/gluster_export/brick has disconnected
from glusterd.
[2018-03-20 13:34:14.289765] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2018-03-20 13:34:14.292523] I [MSGID: 106005]
[glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
Brick borg-sphere-two:/srv/gluster_iso/brick has disconnected from
glusterd.
[2018-03-20 13:34:14.295494] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2018-03-20 13:34:14.298261] I [MSGID: 106005]
[glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
Brick borg-sphere-two:/srv/gluster_plexus/brick has disconnected
from glusterd.
[2018-03-20 13:34:14.298421] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd:
Received ACC from uuid: 0e8b912a-bcff-4b33-88c6-428b3e658440, host:
borg-sphere-one, port: 0
[2018-03-20 13:34:14.298935] I
[glusterd-utils.c:5847:glusterd_brick_start] 0-management:
discovered already-running brick /srv/gluster_engine/brick
[2018-03-20 13:34:14.298958] I [MSGID: 106143]
[glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick
/srv/gluster_engine/brick on port 49152
[2018-03-20 13:34:14.299037] I
[glusterd-utils.c:5847:glusterd_brick_start] 0-management:
discovered already-running brick /srv/gluster_export/brick
[2018-03-20 13:34:14.299051] I [MSGID: 106143]
[glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick
/srv/gluster_export/brick on port 49153
[2018-03-20 13:34:14.299117] I
[glusterd-utils.c:5847:glusterd_brick_start] 0-management:
discovered already-running brick /srv/gluster_iso/brick
[2018-03-20 13:34:14.299130] I [MSGID: 106143]
[glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick
/srv/gluster_iso/brick on port 49154
[2018-03-20 13:34:14.299208] I
[glusterd-utils.c:5847:glusterd_brick_start] 0-management:
discovered already-running brick /srv/gluster_plexus/brick
[2018-03-20 13:34:14.299223] I [MSGID: 106143]
[glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick
/srv/gluster_plexus/brick on port 49155
[2018-03-20 13:34:14.299292] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
already stopped
[2018-03-20 13:34:14.299344] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs
service is stopped
[2018-03-20 13:34:14.299365] I [MSGID: 106600]
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management:
nfs/server.so xlator is not installed
[2018-03-20 13:34:14.302501] I [MSGID: 106568]
[glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping
glustershd daemon running in pid: 3896
[2018-03-20 13:34:15.302703] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd
service is stopped
[2018-03-20 13:34:15.302798] I [MSGID: 106567]
[glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting
glustershd service
[2018-03-20 13:34:15.305136] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad
already stopped
[2018-03-20 13:34:15.305172] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad
service is stopped
[2018-03-20 13:34:15.305384] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd
already stopped
[2018-03-20 13:34:15.305406] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd
service is stopped
[2018-03-20 13:34:15.305599] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub
already stopped
[2018-03-20 13:34:15.305618] I [MSGID: 106568]
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub
service is stopped
[2018-03-20 13:34:15.323512] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2018-03-20 13:34:15.326856] I [MSGID: 106005]
[glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
Brick borg-sphere-two:/srv/gluster_navaar/brick has disconnected
from glusterd.
[2018-03-20 13:34:15.329968] I [MSGID: 106493]
[glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management:
Received ACC from uuid: 0e8b912a-bcff-4b33-88c6-428b3e658440
[2018-03-20 13:34:15.330024] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 31202
[2018-03-20 13:34:15.335968] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid:
0acd0bff-c38f-4c49-82da-4112d22dfd2c
[2018-03-20 13:34:15.336908] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to borg-sphere-three (0), ret: 0, op_ret: 0
[2018-03-20 13:34:15.340577] I [MSGID: 106144]
[glusterd-pmap.c:396:pmap_registry_remove] 0-pmap: removing brick
/srv/gluster_engine/brick on port 49152
[2018-03-20 13:34:15.340669] E [socket.c:2369:socket_connect_finish]
0-management: connection to
/var/run/gluster/7c88a1ced3d7819183c1b75562132753.socket failed
(Connection reset by peer); disconnecting socket
[2018-03-20 13:34:15.343472] E [socket.c:2369:socket_connect_finish]
0-management: connection to
/var/run/gluster/92f05640572fdb863e0d3655821a9221.socket failed
(Connection reset by peer); disconnecting socket
[2018-03-20 13:34:15.346173] E [socket.c:2369:socket_connect_finish]
0-management: connection to
/var/run/gluster/855c85c59ce6144e0cdaadc081dab574.socket failed
(Connection reset by peer); disconnecting socket
[2018-03-20 13:34:15.351476] W [socket.c:593:__socket_rwv]
0-management: readv on
/var/run/gluster/2ac0088f40227ca69fb39d3c98e51d2d.socket failed (No
data available)
[2018-03-20 13:34:15.354084] I [MSGID: 106005]
[glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management:
Brick borg-sphere-two:/srv/gluster_plexus/brick has disconnected
from glusterd.
[2018-03-20 13:34:15.354184] I [MSGID: 106144]
[glusterd-pmap.c:396:pmap_registry_remove] 0-pmap: removing brick
/srv/gluster_plexus/brick on port 49155
[2018-03-20 13:34:15.354222] I [MSGID: 106492]
[glusterd-handler.c:2718:__glusterd_handle_friend_update]
0-glusterd: Received friend update from uuid:
0acd0bff-c38f-4c49-82da-4112d22dfd2c
[2018-03-20 13:34:15.354597] I [MSGID: 106502]
[glusterd-handler.c:2763:__glusterd_handle_friend_update]
0-management: Received my uuid as Friend
[2018-03-20 13:34:15.354645] I [MSGID: 106493]
[glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management:
Received ACC from uuid: 0acd0bff-c38f-4c49-82da-4112d22dfd2c
[2018-03-20 13:34:15.354670] I [MSGID: 106144]
[glusterd-pmap.c:396:pmap_registry_remove] 0-pmap: removing brick
/srv/gluster_export/brick on port 49153
[2018-03-20 13:34:15.354789] I [MSGID: 106144]
[glusterd-pmap.c:396:pmap_registry_remove] 0-pmap: removing brick
/srv/gluster_iso/brick on port 49154
[2018-03-20 13:34:15.354905] I [MSGID: 106492]
[glusterd-handler.c:2718:__glusterd_handle_friend_update]
0-glusterd: Received friend update from uuid:
0e8b912a-bcff-4b33-88c6-428b3e658440
[2018-03-20 13:34:15.354927] I [MSGID: 106502]
[glusterd-handler.c:2763:__glusterd_handle_friend_update]
0-management: Received my uuid as Friend
[2018-03-20 13:34:15.355536] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 31202
[2018-03-20 13:34:15.359667] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid:
0e8b912a-bcff-4b33-88c6-428b3e658440
[2018-03-20 13:34:15.360277] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to borg-sphere-one (0), ret: 0, op_ret: 0
[2018-03-20 13:34:15.361113] I [MSGID: 106492]
[glusterd-handler.c:2718:__glusterd_handle_friend_update]
0-glusterd: Received friend update from uuid:
0e8b912a-bcff-4b33-88c6-428b3e658440
[2018-03-20 13:34:15.361151] I [MSGID: 106502]
[glusterd-handler.c:2763:__glusterd_handle_friend_update]
0-management: Received my uuid as Friend

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: OpenPGP digital signature
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180321/5edb8c08/attachment.sig>


More information about the Gluster-users mailing list