[Bugs] [Bug 1528641] New: Brick processes fail to start
bugzilla at redhat.com
bugzilla at redhat.com
Fri Dec 22 14:07:14 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1528641
Bug ID: 1528641
Summary: Brick processes fail to start
Product: GlusterFS
Version: 3.12
Component: rpc
Severity: high
Assignee: bugs at gluster.org
Reporter: rob at abcxyz.nl
CC: bugs at gluster.org
Description of problem:
Version-Release number of selected component (if applicable): 3.12.4
How reproducible:
I can not reproduce, but I can describe the enviroment:
+ 2 node cluster on bare metal
+ 20 volumes
+ 2 fold replication
+ One vg controlled by manual heketi (SATA disks)
+ One vg controlled by Kubernetes hekeri (SSD disks)
Actual results:
On one node (of a two node cluster) some brick processes do not start (or exit
at quickly start). Both manual heketi volumes and automatic heketi volumes are
affected.
Logfile shows
The message "I [MSGID: 106005]
[glusterd-handler.c:6063:__glusterd_brick_rpc_notify] 0-management: Brick
10.10.0.68:/local.mnt/glfs0/brick7-2/brick has disconnected from glusterd."
repeated 39 times between [2017-12-22 08:17:45.141586] and [2017-12-22 08:19:42
.156703]
The message "I [MSGID: 106005]
[glusterd-handler.c:6063:__glusterd_brick_rpc_notify] 0-management: Brick
10.10.0.68:/var/lib/heketi/mounts/vg_27ab4f2ccdc2674a3270206903ab1cad/brick_1da8d26de2936277d5aadecee18f3591/brick
has disconnected from glusterd." repeate
d 39 times between [2017-12-22 08:17:45.144381] and [2017-12-22
08:19:42.159691]
[2017-12-22 08:19:45.155640] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:45.157038] I [MSGID: 106005]
[glusterd-handler.c:6063:__glusterd_brick_rpc_notify] 0-management: Brick
10.10.0.68:/local.mnt/glfs0/brick7-2/brick has disconnected from glusterd.
[2017-12-22 08:19:45.158571] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:45.160072] I [MSGID: 106005]
[glusterd-handler.c:6063:__glusterd_brick_rpc_notify] 0-management: Brick
10.10.0.68:/var/lib/heketi/mounts/vg_27ab4f2ccdc2674a3270206903ab1cad/brick_1da8d26de2936277d5aadecee18f3591/brick
has disconnected from glusterd.
[2017-12-22 08:19:48.155914] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:48.159153] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:51.155802] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:51.158906] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:54.156162] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:54.159027] I [socket.c:2474:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
[2017-12-22 08:19:57.156709] I [socket.c:2474:socket_event_handler] 0-
...and so on...
Expected results:
Having bricks running on both nodes so replication does actually add some
redundancy.
Additional info:
Might be related to bug 1484885 - [rpc]: EPOLLERR - disconnecting now messages
every 3 secs after completing rebalance
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list