[Bugs] [Bug 1809326] New: Volumes keeps going offline with messages -- posix-helpers.c:2150:posix_health_check_thread_proc

Mon Mar 2 20:53:22 UTC 2020

https://bugzilla.redhat.com/show_bug.cgi?id=1809326

            Bug ID: 1809326
           Summary: Volumes keeps going offline with messages --
                    posix-helpers.c:2150:posix_health_check_thread_proc
           Product: GlusterFS
           Version: 6
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: glusterd
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: amgad.saleh at nokia.com
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community

Description of problem:

We have a 3-node system - replica 3 and also a kubernetes storage class.
The issue with the 3-node volumes. Two of them are getting offline all the
time. Once you stop/start, they come online and few seconds later they go
offline.

The following Broadcast messages came out:

Broadcast message from systemd-journald at telco-control-02 (Mon 2020-03-02
19:02:02 UTC):

data0-glusterfs-helm-home[30981]: [2020-03-02 19:02:02.357179] M [MSGID:
113075] [posix-helpers.c:2168:posix_health_check_thread_proc]
0-bcmt-helm-home-posix: still alive! -> SIGTERM

Message from syslogd at telco-control-02 at Mar  2 19:02:02 ...
 data0-glusterfs-helm-home[30981]:[2020-03-02 19:02:02.356492] M [MSGID:
113075] [posix-helpers.c:2150:posix_health_check_thread_proc]
0-bcmt-helm-home-posix: health-check failed, going down

Message from syslogd at telco-control-02 at Mar  2 19:02:02 ...
 data0-glusterfs-helm-home[30981]:[2020-03-02 19:02:02.357179] M [MSGID:
113075] [posix-helpers.c:2168:posix_health_check_thread_proc]
0-bcmt-helm-home-posix: still alive! -> SIGTERM

Broadcast message from systemd-journald at telco-control-02 (Mon 2020-03-02
19:02:02 UTC):

data0-glusterfs-cbur-repo[31014]: [2020-03-02 19:02:02.891885] M [MSGID:
113075] [posix-helpers.c:2150:posix_health_check_thread_proc]
0-cbur-glusterfs-repo-posix: health-check failed, going down

Broadcast message from systemd-journald at telco-control-02 (Mon 2020-03-02
19:02:02 UTC):

data0-glusterfs-cbur-repo[31014]: [2020-03-02 19:02:02.892088] M [MSGID:
113075] [posix-helpers.c:2168:posix_health_check_thread_proc]
0-cbur-glusterfs-repo-posix: still alive! -> SIGTERM

Message from syslogd at telco-control-02 at Mar  2 19:02:02 ...
 data0-glusterfs-cbur-repo[31014]:[2020-03-02 19:02:02.891885] M [MSGID:
113075] [posix-helpers.c:2150:posix_health_check_thread_proc]
0-cbur-glusterfs-repo-posix: health-check failed, going down

Message from syslogd at telco-control-02 at Mar  2 19:02:02 ...
 data0-glusterfs-cbur-repo[31014]:[2020-03-02 19:02:02.892088] M [MSGID:
113075] [posix-helpers.c:2168:posix_health_check_thread_proc]
0-cbur-glusterfs-repo-posix: still alive! -> SIGTERM

Version-Release number of selected component (if applicable):

GlusterFS is 6.5 running on CentOS EL7 and Kernel 4.18.0-80.11.2.el8_0.x86_64
Is there any issue with GlusterFS EL7 RPMs running on CentOS EL7 and Kernel
4.18.0-80.11.2.el8_0.x86_64?

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.