[Bugs] [Bug 1659825] Regurarly health-check failed, going down

Mon Dec 17 02:52:47 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1659825

Mohit Agrawal <moagrawa at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |moagrawa at redhat.com

--- Comment #1 from Mohit Agrawal <moagrawa at redhat.com> ---
Hi,

On the basis of current logs, it seems more than one brick processes are
running on the nodes for the same brick.

First time brick was started at the time

>>>>>>>>>>>>>>>>>>>>>>

2018-12-15 10:37:15.124245] I [MSGID: 100030] [glusterfsd.c:2454:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.8.8
(args: /usr/sbin/glusterfsd -s 192.168.100.1 --volfile-id
datavol.192.168.100.1.data-gluster-brick1 -p
/var/lib/glusterd/vols/datavol/run/192.168.100.1-data-gluster-brick1.pid -S
/var/run/gluster/94414a1c51f6146a22e4158fdc3505f2.socket --brick-name
/data/gluster/brick1 -l /var/log/glusterfs/bricks/data-gluster-brick1.log
--xlator-option *-posix.glusterd-uuid=e0f96163-8510-42a8-8437-60f4883f9a03
--brick-port 49152 --xlator-option datavol-server.listen-port=49152)

>>>>>>>>>>>>>>>>>>>>>

Thereafter I am not able to see any message specific to the graceful shutdown
of the brick process but the new brick process was started.
Did you reboot the node at that time that's why no cleanup_and_exit message
specific to brick shutdown ??

>>>>>>>>>>>>>>>>>>>.

[2018-12-15 17:04:25.642955] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datavol-server: accepted client
from
CTX_ID:3d9429d7-f51c-4481-9217-cd489951fcf4-GRAPH_ID:0-PID:17284-HOST:server2-PC_NAME:datavol-client-0-RECON_NO:-0
(version: 5.2)
[2018-12-15 17:04:25.643509] I [login.c:76:gf_auth] 0-auth/login: allowed user
names: ff65d1fc-1bbd-41c6-bb71-fe0fe3028d15
[2018-12-15 17:04:25.643544] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datavol-server: accepted client
from server3-5745-2018/12/15-17:04:20:745092-datavol-client-0-0-0 (version:
3.8.8)
[2018-12-15 17:04:26.259670] I [MSGID: 100030] [glusterfsd.c:2691:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 5.2 (args:
/usr/sbin/glusterfsd -s 192.168.100.1 --volfile-id
datavol.192.168.100.1.data-gluster-brick1 -p
/var/run/gluster/vols/datavol/192.168.100.1-data-gluster-brick1.pid -S
/var/run/gluster/1a6b22a893ad975c.socket --brick-name /data/gluster/brick1 -l
/var/log/glusterfs/bricks/data-gluster-brick1.log --xlator-option
*-posix.glusterd-uuid=e0f96163-8510-42a8-8437-60f4883f9a03 --process-name brick
--brick-port 49153 --xlator-option datavol-server.listen-port=49153)
[2018-12-15 17:04:26.263795] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1

>>>>>>>>>>>>>>>>>>>>>

Below is the health check timestamp  as shared by you, it means brick is still
running and some thread is trying to update timestamp in health check file
because i am not able to see any message at the same timestamp in logs

server1 # cat /data/gluster/brick1/.glusterfs/health_check
2018-12-16 16:26:46

Thanks,
Mohit Agrawal

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.