[Bugs] [Bug 1589253] After creating and starting 601 volumes, self heal daemon went down and seeing continuous warning messages in glusterd log

Fri Jun 8 14:20:53 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1589253

Sanju <srakonde at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|bugs at gluster.org            |srakonde at redhat.com

--- Comment #1 from Sanju <srakonde at redhat.com> ---
Description of problem:
--------------------------------------------------------------------
On a three node cluster, Created and started 600(2X3) volumes. All the bricks
and the self-heal daemon is running properly. Then created a new volume of type
2X3, the self-heal daemon stopped running and seeing the continuous warning for
every 7 seconds.
---------------------------------------------------------------------
[2018-05-22 09:10:54.352926] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)
[2018-05-22 09:11:01.354185] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)
[2018-05-22 09:11:08.355858] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)
[2018-05-22 09:11:15.358315] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)
[2018-05-22 09:11:22.360205] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)

Version-Release number of selected component (if applicable):

How reproducible:
1/1

Steps to Reproduce:
1. On a three node cluster, created 600 volumes of type replicate (2X3) and
started them using a script
2. Created a new volume of type replicate 2X3 volume and started it 
3. Volume started successfully

Actual results:
Self-heal daemon went down and seeing continuous warning messages for every 7
seconds as below

[2018-05-22 08:48:09.064406] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)
[2018-05-22 08:48:16.065553] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)
[2018-05-22 08:48:23.066968] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)
[2018-05-22 08:48:30.068186] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)
[2018-05-22 08:48:37.069355] W [socket.c:3266:socket_connect] 0-glustershd:
Ignore failed connection attempt on
/var/run/gluster/a218720a3b016edcafc4598e18d17126.socket, (No such file or
directory)

Expected results:
Self-heal daemon should be running

Additional info:

[root at dhcp37-214 ~]# gluster vol info deadpool

Volume Name: deadpool
Type: Distributed-Replicate
Volume ID: 25cf7f2f-3369-4ffc-8349-ce7c146b9ff2
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.37.214:/bricks/brick0/rel
Brick2: 10.70.37.178:/bricks/brick0/rel
Brick3: 10.70.37.46:/bricks/brick0/rel
Brick4: 10.70.37.214:/bricks/brick1/rel
Brick5: 10.70.37.178:/bricks/brick1/rel
Brick6: 10.70.37.46:/bricks/brick1/rel
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.