[Bugs] [Bug 1323564] New: [scale] Brick process does not start after node reboot
bugzilla at redhat.com
bugzilla at redhat.com
Mon Apr 4 05:57:04 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1323564
Bug ID: 1323564
Summary: [scale] Brick process does not start after node reboot
Product: GlusterFS
Version: 3.7.10
Component: glusterd
Assignee: bugs at gluster.org
Reporter: amukherj at redhat.com
CC: bugs at gluster.org, nerawat at redhat.com,
sasundar at redhat.com, storage-qa-internal at redhat.com
Depends On: 1322306, 1322805
+++ This bug was initially created as a clone of Bug #1322805 +++
+++ This bug was initially created as a clone of Bug #1322306 +++
Description of problem:
Brick process does not start automatically after reboot of a node.
Failed to start bricks using start force. It worked after running it 2-3 times.
Setup:
4 node cluster running 120 dist-rep [2 x 2] volumes.
I have seen this issue with only 3 volumes, vol4, vol31, vol89. Though there
are other volumes with bricks running on same node.
Version-Release number of selected component (if applicable):
3.7.5-19
How reproducible:
Not always, haven't faced this issue till 100 volume.
Steps to Reproduce:
1. Reboot a Gluster node
2. Check the status of gluster volume
Actual results:
Brick process is not running
Expected results:
Brick process should come up after node reboot
Additional info:
Will attach sosreport and setup details
--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-03-30
06:38:39 EDT ---
This bug is automatically being proposed for the current z-stream release of
Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'.
If this bug should be proposed for a different release, please manually change
the proposed release flag.
--- Additional comment from SATHEESARAN on 2016-03-30 08:14:52 EDT ---
Neha,
Could you attach sosreports from both the nodes for analysis ?
--- Additional comment from Neha on 2016-03-31 01:00:51 EDT ---
Atin has already checked the setup. After reboot other process is consuming the
port assigned to brick process.
Brick logs:
[2016-03-30 12:21:33.014717] I [MSGID: 100030] [glusterfsd.c:2318:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.5
(args: /usr/sbin/glusterfsd -s 10.70.36.3 --volfile-id
vol6.10.70.36.3.var-lib-heketi-mounts-vg_1d43e80bda3b78bff5f2bd4e788c5bb3-brick_24db93a1ba7c2021d30db0bb7523f1d4-brick
-p
/var/lib/glusterd/vols/vol6/run/10.70.36.3-var-lib-heketi-mounts-vg_1d43e80bda3b78bff5f2bd4e788c5bb3-brick_24db93a1ba7c2021d30db0bb7523f1d4-brick.pid
-S /var/run/gluster/4d936da2aa9f690e51cb86f5cf49740a.socket --brick-name
/var/lib/heketi/mounts/vg_1d43e80bda3b78bff5f2bd4e788c5bb3/brick_24db93a1ba7c2021d30db0bb7523f1d4/brick
-l
/var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_1d43e80bda3b78bff5f2bd4e788c5bb3-brick_24db93a1ba7c2021d30db0bb7523f1d4-brick.log
--xlator-option *-posix.glusterd-uuid=b5b78ebd-94f4-4a96-a9ba-6621e730a411
--brick-port 49161 --xlator-option vol6-server.listen-port=49161)
[2016-03-30 12:21:33.022880] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2016-03-30 12:21:48.059423] I [graph.c:269:gf_add_cmdline_options]
0-vol6-server: adding option 'listen-port' for volume 'vol6-server' with value
'49161'
[2016-03-30 12:21:48.059447] I [graph.c:269:gf_add_cmdline_options]
0-vol6-posix: adding option 'glusterd-uuid' for volume 'vol6-posix' with value
'b5b78ebd-94f4-4a96-a9ba-6621e730a411'
[2016-03-30 12:21:48.059643] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 2
[2016-03-30 12:21:48.059666] I [MSGID: 115034]
[server.c:403:_check_for_auth_option]
0-/var/lib/heketi/mounts/vg_1d43e80bda3b78bff5f2bd4e788c5bb3/brick_24db93a1ba7c2021d30db0bb7523f1d4/brick:
skip format check for non-addr auth option
auth.login./var/lib/heketi/mounts/vg_1d43e80bda3b78bff5f2bd4e788c5bb3/brick_24db93a1ba7c2021d30db0bb7523f1d4/brick.allow
[2016-03-30 12:21:48.059687] I [MSGID: 115034]
[server.c:403:_check_for_auth_option]
0-/var/lib/heketi/mounts/vg_1d43e80bda3b78bff5f2bd4e788c5bb3/brick_24db93a1ba7c2021d30db0bb7523f1d4/brick:
skip format check for non-addr auth option
auth.login.11c450ad-efc3-49a8-952f-68f8b37eb539.password
[2016-03-30 12:21:48.060695] I [rpcsvc.c:2215:rpcsvc_set_outstanding_rpc_limit]
0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2016-03-30 12:21:48.060762] W [MSGID: 101002] [options.c:957:xl_opt_validate]
0-vol6-server: option 'listen-port' is deprecated, preferred is
'transport.socket.listen-port', continuing with correction
[2016-03-30 12:21:48.060850] E [socket.c:769:__socket_server_bind]
0-tcp.vol6-server: binding to failed: Address already in use
[2016-03-30 12:21:48.060861] E [socket.c:772:__socket_server_bind]
0-tcp.vol6-server: Port is already in use
[2016-03-30 12:21:48.060871] W [rpcsvc.c:1604:rpcsvc_transport_create]
0-rpc-service: listening on transport failed
[2016-03-30 12:21:48.060877] W [MSGID: 115045] [server.c:1060:init]
0-vol6-server: creation of listener failed
[2016-03-30 12:21:48.060892] E [MSGID: 101019] [xlator.c:428:xlator_init]
0-vol6-server: Initialization of volume 'vol6-server' failed, review your
volfile again
[2016-03-30 12:21:48.060898] E [graph.c:322:glusterfs_graph_init]
0-vol6-server: initializing translator failed
[2016-03-30 12:21:48.060902] E [graph.c:661:glusterfs_graph_activate] 0-graph:
init failed
[2016-03-30 12:21:48.061387] W [glusterfsd.c:1236:cleanup_and_exit]
(-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x331) [0x7f07cc09e801]
-->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x126) [0x7f07cc0991a6]
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x69) [0x7f07cc098789] ) 0-: received
signum (0), shutting down
--- Additional comment from Vijay Bellur on 2016-03-31 07:15:11 EDT ---
REVIEW: http://review.gluster.org/13865 (glusterd: Do not persist brickinfo
ports) posted (#2) for review on master by Atin Mukherjee (amukherj at redhat.com)
--- Additional comment from Vijay Bellur on 2016-03-31 08:56:58 EDT ---
REVIEW: http://review.gluster.org/13865 (glusterd: Allocate fresh port on brick
(re)start) posted (#3) for review on master by Atin Mukherjee
(amukherj at redhat.com)
--- Additional comment from Vijay Bellur on 2016-04-01 00:33:04 EDT ---
REVIEW: http://review.gluster.org/13865 (glusterd: Allocate fresh port on brick
(re)start) posted (#4) for review on master by Atin Mukherjee
(amukherj at redhat.com)
--- Additional comment from Vijay Bellur on 2016-04-01 02:37:02 EDT ---
REVIEW: http://review.gluster.org/13865 (glusterd: Allocate fresh port on brick
(re)start) posted (#5) for review on master by Atin Mukherjee
(amukherj at redhat.com)
--- Additional comment from Vijay Bellur on 2016-04-01 03:05:09 EDT ---
REVIEW: http://review.gluster.org/13865 (glusterd: Allocate fresh port on brick
(re)start) posted (#6) for review on master by Atin Mukherjee
(amukherj at redhat.com)
--- Additional comment from Vijay Bellur on 2016-04-01 16:39:01 EDT ---
COMMIT: http://review.gluster.org/13865 committed in master by Jeff Darcy
(jdarcy at redhat.com)
------
commit 34899d71f21fd2b4c523b68ffb2d7c655c776641
Author: Atin Mukherjee <amukherj at redhat.com>
Date: Thu Mar 31 11:01:53 2016 +0530
glusterd: Allocate fresh port on brick (re)start
There is no point of using the same port through the entire volume life
cycle
for a particular bricks process since there is no guarantee that the same
port
would be free and no other application wouldn't consume it in between the
glusterd/volume restart.
We hit a race where on glusterd restart the daemon services start followed
by
brick processes and the time brick process tries to bind with the port
which was
allocated by glusterd before a restart is been already consumed by some
other
client like NFS/SHD/...
Note : This is a short term solution as here we reduce the race window but
don't
eliminate it completely. As a long term solution the port allocation has to
be
done by glusterfsd and the same should be communicated back to glusterd for
book
keeping
Change-Id: Ibbd1e7ca87e51a7cd9cf216b1fe58ef7783aef24
BUG: 1322805
Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
Reviewed-on: http://review.gluster.org/13865
Smoke: Gluster Build System <jenkins at build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1322306
[Bug 1322306] [scale] Brick process does not start after node reboot
https://bugzilla.redhat.com/show_bug.cgi?id=1322805
[Bug 1322805] [scale] Brick process does not start after node reboot
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list