[Bugs] [Bug 1362170] New: Disconnection Issues in glusterd

bugzilla at redhat.com bugzilla at redhat.com
Mon Aug 1 12:53:23 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1362170

            Bug ID: 1362170
           Summary: Disconnection Issues in glusterd
           Product: GlusterFS
           Version: 3.6.9
         Component: glusterd
          Assignee: bugs at gluster.org
          Reporter: olympia.kremmyda at nokia.com
                CC: bugs at gluster.org



Description of problem:

We use v3.6.9 and we have a deployment with 2 storage nodes with bricks and a
third one acting as quorum. We have noticed a weird behavior in storage nodes
with bricks in every deployment:

glusterd logs are full of the following message:

[2016-07-15 04:08:01.558743] W [socket.c:620:__socket_rwv] 0-management: readv
on /var/run/2d164c87498372d99c47b3b627dcd393.socket failed (Invalid argument)

In addition, the following message is reported periodically in glusterd logs:

The message "I [MSGID: 106006]
[glusterd-handler.c:4290:__glusterd_nodesvc_rpc_notify] 0-management: nfs has
disconnected from glusterd." repeated 39 times between [2016-07-15
04:06:04.513882] and [2016-07-15 04:08:01.558787]

We found the following bug fix, which is quite similar:

http://review.gluster.org/#/c/9269/


In addition to messages mentioned above, many times we face functionality
problems and storage is not operational. We have collected the following logs
from glusterfsd:

[2016-07-18 07:48:36.800400] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2016-07-18 07:48:36.974391] I [glusterfsd-mgmt.c:1504:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-07-18 07:48:36.982390] I [server.c:518:server_rpc_notify]
0-voldata-server: disconnecting connection from
SN-0-1580-2016/07/18-07:48:34:958658-voldata-client-0-0-0
[2016-07-18 07:48:36.982422] I [client_t.c:417:gf_client_unref]
0-voldata-server: Shutting down connection
SN-0-1580-2016/07/18-07:48:34:958658-voldata-client-0-0-0
[2016-07-18 07:48:38.015640] I [login.c:82:gf_auth] 0-auth/login: allowed user
names: 7a00c0d0-e728-4c9a-92d7-819a10952886
[2016-07-18 07:48:38.015662] I [server-handshake.c:585:server_setvolume]
0-voldata-server: accepted client from
SN-0-1622-2016/07/18-07:48:37:983796-voldata-client-0-0-0 (version: 3.6.9)
[2016-07-18 07:48:38.284775] I [server.c:518:server_rpc_notify]
0-voldata-server: disconnecting connection from
SN-1-2297-2016/07/18-07:48:36:82735-voldata-client-0-0-0
[2016-07-18 07:48:38.284823] I [client_t.c:417:gf_client_unref]
0-voldata-server: Shutting down connection
SN-1-2297-2016/07/18-07:48:36:82735-voldata-client-0-0-0
[2016-07-18 07:48:39.331257] I [login.c:82:gf_auth] 0-auth/login: allowed user
names: 7a00c0d0-e728-4c9a-92d7-819a10952886
[2016-07-18 07:48:39.331282] I [server-handshake.c:585:server_setvolume]
0-voldata-server: accepted client from
SN-1-2350-2016/07/18-07:48:38:966005-voldata-client-0-0-0 (version: 3.6.9)
[2016-07-18 07:48:41.357084] I [login.c:82:gf_auth] 0-auth/login: allowed user
names: 7a00c0d0-e728-4c9a-92d7-819a10952886
[2016-07-18 07:48:41.357109] I [server-handshake.c:585:server_setvolume]
0-voldata-server: accepted client from
SN-2-2765-2016/07/18-07:48:41:308956-voldata-client-0-0-0 (version: 3.6.9)
[2016-07-18 07:48:42.915110] I [login.c:82:gf_auth] 0-auth/login: allowed user
names: 7a00c0d0-e728-4c9a-92d7-819a10952886
[2016-07-18 07:48:42.915136] I [server-handshake.c:585:server_setvolume]
0-voldata-server: accepted client from
SN-0-1894-2016/07/18-07:48:42:894933-voldata-client-0-0-0 (version: 3.6.9)
[2016-07-18 07:48:53.873241] I [server-handshake.c:585:server_setvolume]
0-voldata-server: accepted client from
MN-0-740-2016/07/18-07:48:53:295469-voldata-client-0-0-0 (version: 3.6.9)
[2016-07-18 07:48:59.645589] I [server-handshake.c:585:server_setvolume]
0-voldata-server: accepted client from
MN-1-728-2016/07/18-07:48:59:628747-voldata-client-0-0-0 (version: 3.6.9)
[2016-07-18 07:56:06.939668] I [server.c:518:server_rpc_notify]
0-voldata-server: disconnecting connection from
SN-1-1152-2016/07/18-07:48:09:699332-voldata-client-0-0-0
[2016-07-18 07:56:06.939692] I [client_t.c:417:gf_client_unref]
0-voldata-server: Shutting down connection
SN-1-1152-2016/07/18-07:48:09:699332-voldata-client-0-0-0
[2016-07-18 07:56:06.939722] I [server.c:518:server_rpc_notify]
0-voldata-server: disconnecting connection from
SN-2-1134-2016/07/18-07:48:10:12643-voldata-client-0-0-0
[2016-07-18 07:56:06.939735] I [client_t.c:417:gf_client_unref]
0-voldata-server: Shutting down connection
SN-2-1134-2016/07/18-07:48:10:12643-voldata-client-0-0-0
[2016-07-18 07:56:06.942400] W [glusterfsd.c:1211:cleanup_and_exit] (--> 0-:
received signum (15), shutting down

After the last message, glusterfsd restarts and after a few minutes it comes on
the same situation with above messages printed again. During the short period
of operation we observed many disconnections and re-connections of gluster
clients.

At the same time, glusterd prints the following messages:

/usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x356)[0x7f9df8acc1f5]
(-->
/usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x289)[0x7f9df8a2da4b]
(-->
/usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x61)[0x7f9df8a22384]
(-->
/usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(glusterd_peer_rpc_notify+0x38)[0x7f9df8a2dbea]
))))) 0-management: Lock for vol services not held
The message "I [MSGID: 106006]
[glusterd-handler.c:4290:__glusterd_nodesvc_rpc_notify] 0-management: nfs has
disconnected from glusterd." repeated 39 times between [2016-07-18
07:58:08.475445] and [2016-07-18 08:00:06.414826]

and after a while connection with the other storage server and quorum server
lost.

Could you please check and provide some help?

Version-Release number of selected component (if applicable):
v3.6.9

How reproducible:
Often

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

FYI we have back-ported the following patches from 3.7 branch:
http://review.gluster.org/#/c/10829/
http://review.gluster.org/#/c/13358/
http://review.gluster.org/#/c/14052/
http://review.gluster.org/#/c/12310/
http://review.gluster.org/#/c/13392/

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list