[Bugs] [Bug 1659825] New: Regurarly health-check failed, going down

Sun Dec 16 20:02:47 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1659825

            Bug ID: 1659825
           Summary: Regurarly health-check failed, going down
           Product: GlusterFS
           Version: 5
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: glusterd
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: aedu at wyssmann.com
                CC: bugs at gluster.org
  Target Milestone: ---
   External Bug ID: Github 616
    Classification: Community

Created attachment 1514921
  --> https://bugzilla.redhat.com/attachment.cgi?id=1514921&action=edit
logs of all 3 nodes

I have a cluster setup as mentioned in #615 which consists of 3 nodes:

    server1, 192.168.100.1
    server2, 192.168.100.2
    server3, 192.168.100.3

I am using Gluster 5.2. The volume status is healthy, but when - for testing -
I start to copy a bunch of files to the mounted volume I suddenly get

data-gluster-brick1[10104]: [2018-12-16 11:54:53.398787] M [MSGID: 113075]
[posix-helpers.c:1957:posix_health_check_thread_proc] 0-datavol-posix:
health-check failed, going down

Broadcast message from systemd-journald at server3 (Sun 2018-12-16 12:54:53 CET):
data-gluster-brick1[10104]: [2018-12-16 11:54:53.398861] M [MSGID: 113075]
[posix-helpers.c:1975:posix_health_check_thread_proc] 0-datavol-posix: still
alive! -> SIGTERM

Message from syslogd at localhost at Dec 16 12:54:53 ...
 data-gluster-brick1[10104]: [2018-12-16 11:54:53.398861] M [MSGID: 113075]
[posix-helpers.c:1975:posix_health_check_thread_proc] 0-datavol-posix: still
alive! -> SIGTERM

glusterd reports

Dec 16 12:45:47 server1 data-gluster-brick1[15946]: [2018-12-16
11:45:47.940510] M [MSGID: 113075]
[posix-helpers.c:1957:posix_health_check_thread_proc] 0-datavol-posix:
health-check failed, going down
Dec 16 12:45:47 server1 data-gluster-brick1[15946]: [2018-12-16
11:45:47.940650] M [MSGID: 113075]
[posix-helpers.c:1975:posix_health_check_thread_proc] 0-datavol-posix: still
alive! -> SIGTERM

The volume is a replicated one, with 1 brick per node. The bricks are on top of
a thin pool

# lvdisplay 
  --- Logical volume ---
  LV Name                vg_md3_thinpool
  VG Name                vg_md3
  LV UUID                w9Obnd-rPz0-kPUX-UQpw-8WBv-JsWp-iNWgHH
  LV Write Access        read/write
  LV Creation host, time server1, 2018-12-10 15:01:19 +0100
  LV Pool metadata       vg_md3_thinpool_tmeta
  LV Pool data           vg_md3_thinpool_tdata
  LV Status              available
  # open                 2Q
  LV Size                1.70 TiB
  Allocated pool data    0.63%
  Allocated metadata     0.16%
  Current LE             445645
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

  --- Logical volume ---
  LV Path                /dev/vg_md3/vg_md3_thinlv
  LV Name                vg_md3_thinlv
  VG Name                vg_md3
  LV UUID                h3J0tR-qN6u-X5Ea-B5di-TnfR-mt9c-HAkYH1
  LV Write Access        read/write
  LV Creation host, time server1, 2018-12-10 15:01:21 +0100
  LV Pool name           vg_md3_thinpool
  LV Status              available
  # open                 1
  LV Size                1.70 TiB
  Mapped size            0.63%
  Current LE             445645
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:4

Disk config looks as this:

# fdisk -l
Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: C90B8273-29EF-4411-83E3-F8896BE33F22

Device          Start        End    Sectors  Size Type
/dev/sdb1        4096   33558527   33554432   16G Linux RAID
/dev/sdb2    33558528   34607103    1048576  512M Linux RAID
/dev/sdb3    34607104 2182090751 2147483648    1T Linux RAID
/dev/sdb4  2182090752 5860533134 3678442383  1.7T Linux RAID
/dev/sdb5        2048       4095       2048    1M BIOS boot

Partition table entries are not in disk order.

Disk /dev/sda: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 97DB8175-3C0A-4A10-AB86-DBDE6BEA65A2

Device          Start        End    Sectors  Size Type
/dev/sda1        4096   33558527   33554432   16G Linux RAID
/dev/sda2    33558528   34607103    1048576  512M Linux RAID
/dev/sda3    34607104 2182090751 2147483648    1T Linux RAID
/dev/sda4  2182090752 5860533134 3678442383  1.7T Linux RAID
/dev/sda5        2048       4095       2048    1M BIOS boot

Partition table entries are not in disk order.

Disk /dev/md3: 1.7 TiB, 1883228274688 bytes, 3678180224 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/md2: 1023.9 GiB, 1099377410048 bytes, 2147221504 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/md1: 511.4 MiB, 536281088 bytes, 1047424 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/md0: 16 GiB, 17163091968 bytes, 33521664 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/vg_md3-vg_md3_thinlv: 1.7 TiB, 1869170606080 bytes, 3650723840
sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 262144 bytes / 262144 bytes

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.