[Bugs] [Bug 1450567] New: brick process cannot be started at the first time

Sat May 13 09:13:16 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1450567

            Bug ID: 1450567
           Summary: brick process cannot be started at the first time
           Product: GlusterFS
           Version: 3.8
         Component: core
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: likunbyl at qq.com
                CC: bugs at gluster.org

Description of problem:

I'm using a container to run glusterfs server in a kubernetes environment. When
server rebooted, the brick process failed to start at the first time. from
brick log it said:

[2017-05-11 08:49:28.056753] I [MSGID: 100030] [glusterfsd.c:2454:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.8.5
(args: /usr/sbin/glusterfsd -s 10.3.3.11 --volfile-id
gvol0.10.3.3.11.mnt-brick2-vol -p
/var/lib/glusterd/vols/gvol0/run/10.3.3.11-mnt-brick2-vol.pid -S
/var/run/gluster/16909a0348d1da701cfe2486bf91a886.socket --brick-name
/mnt/brick2/vol -l /var/log/glusterfs/bricks/mnt-brick2-vol.log --xlator-option
*-posix.glusterd-uuid=a0fd1343-929c-4851-a0d2-9603b7cc4095 --brick-port 49153
--xlator-option gvol0-server.listen-port=49153)
[2017-05-11 08:49:28.064464] I [MSGID: 101190]
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2017-05-11 08:51:30.661259] W [socket.c:590:__socket_rwv] 0-glusterfs: readv
on 10.3.3.11:24007 failed (Connection reset by peer)
[2017-05-11 08:51:30.661699] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f62bdc09002] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f62bd9d084e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f62bd9d095e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f62bd9d20b4] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f62bd9d2990] ))))) 0-glusterfs:
forced unwinding frame type(GlusterFS Handshake) op(GETSPEC(2)) called at
2017-05-11 08:49:43.653446 (xid=0x1)
[2017-05-11 08:51:30.661716] E [glusterfsd-mgmt.c:1686:mgmt_getspec_cbk]
0-mgmt: failed to fetch volume file (key:gvol0.10.3.3.11.mnt-brick2-vol)
[2017-05-11 08:51:30.661738] W [glusterfsd.c:1327:cleanup_and_exit]
(-->/lib64/libgfrpc.so.0(saved_frames_unwind+0x205) [0x7f62bd9d0875]
-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x536) [0x557a89452fc6]
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x557a8944cb4b] ) 0-: received
signum (0), shutting down
[2017-05-11 08:51:30.664515] I [socket.c:3391:socket_submit_request]
0-glusterfs: not connected (priv->connected = 0)
[2017-05-11 08:51:30.664527] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-glusterfs:
failed to submit rpc-request (XID: 0x2 Program: Gluster Portmap, ProgVers: 1,
Proc: 5) to rpc-transport (glusterfs)

And then I restarted the brick command manually, it worked: 

# /usr/sbin/glusterfsd -s 10.3.3.11 --volfile-id gvol0.10.3.3.11.mnt-brick2-vol
-p /var/lib/glusterd/vols/gvol0/run/10.3.3.11-mnt-brick2-vol.pid -S
/var/run/gluster/16909a0348d1da701cfe2486bf91a886.socket --brick-name
/mnt/brick2/vol -l /var/log/glusterfs/bricks/mnt-brick2-vol.log --xlator-option
*-posix.glusterd-uuid=a0fd1343-929c-4851-a0d2-9603b7cc4095 --brick-port 49153
--xlator-option gvol0-server.listen-port=49153

The log said:

[2017-05-11 08:53:18.553398] I [MSGID: 100030] [glusterfsd.c:2454:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.8.5
(args: /usr/sbin/glusterfsd -s 10.3.3.11 --volfile-id
gvol0.10.3.3.11.mnt-brick2-vol -p
/var/lib/glusterd/vols/gvol0/run/10.3.3.11-mnt-brick2-vol.pid -S
/var/run/gluster/16909a0348d1da701cfe2486bf91a886.socket --brick-name
/mnt/brick2/vol -l /var/log/glusterfs/bricks/mnt-brick2-vol.log --xlator-option
*-posix.glusterd-uuid=a0fd1343-929c-4851-a0d2-9603b7cc4095 --brick-port 49153
--xlator-option gvol0-server.listen-port=49153)
[2017-05-11 08:53:18.560507] I [MSGID: 101190]
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2017-05-11 08:53:18.563946] I [MSGID: 101173]
[graph.c:269:gf_add_cmdline_options] 0-gvol0-server: adding option
'listen-port' for volume 'gvol0-server' with value '49153'
[2017-05-11 08:53:18.563981] I [MSGID: 101173]
[graph.c:269:gf_add_cmdline_options] 0-gvol0-posix: adding option
'glusterd-uuid' for volume 'gvol0-posix' with value
'a0fd1343-929c-4851-a0d2-9603b7cc4095'
[2017-05-11 08:53:18.564570] I [MSGID: 115034]
[server.c:398:_check_for_auth_option] 0-gvol0-decompounder: skip format check
for non-addr auth option auth.login./mnt/brick2/vol.allow
[2017-05-11 08:53:18.564578] I [MSGID: 115034]
[server.c:398:_check_for_auth_option] 0-gvol0-decompounder: skip format check
for non-addr auth option
auth.login.94bedfd1-619d-402a-9826-67dab7600f43.password
[2017-05-11 08:53:18.564652] I [MSGID: 101190]
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 2
[2017-05-11 08:53:18.565311] I [rpcsvc.c:2214:rpcsvc_set_outstanding_rpc_limit]
0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
...

Version-Release number of selected component (if applicable):
OS: coreos 1298.5.0
kubernetes: v1.5.1
Image: official gluster-centos:gluster3u8_centos7
Gluster: 3.8.5

How reproducible:
Reboot glusterfs server

Steps to Reproduce:
1.reboot glusterfs server
2.
3.

Actual results:
Some brick processes failed to start at first time.

Expected results:
All birck processes should start successfully.

Additional info:
if needed

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.