[Bugs] [Bug 1158037] New: brick failure detection does not work for ext4 filesystems

Tue Oct 28 11:42:21 UTC 2014

https://bugzilla.redhat.com/show_bug.cgi?id=1158037

            Bug ID: 1158037
           Summary: brick failure detection does not work for ext4
                    filesystems
           Product: GlusterFS
           Version: 3.6.0
         Component: posix
          Keywords: EasyFix, Triaged
          Severity: medium
          Priority: high
          Assignee: bugs at gluster.org
          Reporter: lmohanty at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com,
                    lmohanty at redhat.com, ndevos at redhat.com
        Depends On: 1130242
            Blocks: 1100204, 1150244

+++ This bug was initially created as a clone of Bug #1130242 +++

+++ This bug was initially created as a clone of Bug #1100204 +++

Description of problem:
The "Brick Failure Detection"
(http://www.gluster.org/community/documentation/index.php/Features/Brick_Failure_Detection)
does not work on ext4 filesystems.

Version-Release number of selected component (if applicable):
<any>

How reproducible:
100%

Steps to Reproduce:
1. see
https://forge.gluster.org/glusterfs-core/glusterfs/blobs/release-3.5/doc/features/brick-failure-detection.md
2. make sure to format the brick(s) as ext4
3. disconnect the disk holding the brick

Actual results:
If there is no activity on the volume the brick failure detection does not
trigger a shutdown of the brick process.

Expected results:
The brick process should notice that the filesystem went read-only and exit.

Additional info:
It seems that stat() on XFS has a check for the filesystem status, ext4 does
not. Replacing stat() by a write() should be sufficient.

--- Additional comment from Niels de Vos on 2014-05-30 03:21:40 EDT ---

Brick Failure Detection has a thread in the POSIX xlator that calls
stat() on a file on the brick in a loop. The stat() returns an error on
XFS in case the filesystem aborted (pull the disk, or RAID-card).
Unfortunately, ext4 does not behave like that, and stat() happily
succeeds. Eric Sandeen and Lukas Czerner don't think that modifying ext4
is the right paths, and guess that any patches to do this will be shot
down. So, we'll need to fix it in Gluster.

The change that needs to be made is in posix_health_check_thread_proc()
in the xlators/storage/posix/src/posix-helpers.c file. Instead of the
stat(), it should write (and read?) something to a new file under the
"priv->base_path + /.glusterfs" directory.

--- Additional comment from Lalatendu Mohanty on 2014-06-05 03:18:32 EDT ---

I tried to reproduce the issue using ext4 brick partitions on master branch
(though the bug is on 3.5, was checking if it is on master branch too)
1. Created the vol using ext4. Bricks are directories on a ext4 partitions
2.  started the volume and mounted it
3. deleted one of the brick , using "rm -rf <brick>"
4. Saw below messages in /var/log/messages and the brick process got killed.

Jun  5 02:05:03 dhcp159-54 d-testvol-1[16906]: [2014-06-05 06:05:03.712181] M
[posix-helpers.c:1413:posix_health_check_thread_proc] 0-test-vol-posix:
health-check failed, going down
Jun  5 02:05:33 dhcp159-54 d-testvol-1[16906]: [2014-06-05 06:05:33.713652] M
[posix-helpers.c:1418:posix_health_check_thread_proc] 0-test-vol-posix: still
alive! -> SIGTERM

The details i.e. commands and output below

[root at dhcp159-54 ~]# mount | grep '/d'

/dev/mapper/fedora_dhcp159--54-home on /d type ext4 (rw,relatime,data=ordered)

[root at dhcp159-54]# gluster v status
Status of volume: test-vol
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick 10.16.159.54:/d/testvol-1                49152    Y    16906
Brick 10.16.159.54:/d/testvol-2                49153    Y    16917
NFS Server on localhost                    2049    Y    16929

Task Status of Volume test-vol
------------------------------------------------------------------------------
There are no active volume tasks

[root at dhcp159-54]# ps aux | grep glusterfsd
root     16906  0.0  1.0 596320 21228 ?        Ssl  02:02   0:00
/usr/local/sbin/glusterfsd -s 10.16.159.54 --volfile-id
test-vol.10.16.159.54.d-testvol-1 -p
/var/lib/glusterd/vols/test-vol/run/10.16.159.54-d-testvol-1.pid -S
/var/run/ee63b3ac874f970ffd0f47685eaaf718.socket --brick-name /d/testvol-1 -l
/usr/local/var/log/glusterfs/bricks/d-testvol-1.log --xlator-option
*-posix.glusterd-uuid=df767b01-d8a1-4bba-b125-404931be1cc8 --brick-port 49152
--xlator-option test-vol-server.listen-port=49152
root     16917  0.0  0.9 596320 19124 ?        Ssl  02:02   0:00
/usr/local/sbin/glusterfsd -s 10.16.159.54 --volfile-id
test-vol.10.16.159.54.d-testvol-2 -p
/var/lib/glusterd/vols/test-vol/run/10.16.159.54-d-testvol-2.pid -S
/var/run/c009c5864d1d438b4e085b9af5fc2416.socket --brick-name /d/testvol-2 -l
/usr/local/var/log/glusterfs/bricks/d-testvol-2.log --xlator-option
*-posix.glusterd-uuid=df767b01-d8a1-4bba-b125-404931be1cc8 --brick-port 49153
--xlator-option test-vol-server.listen-port=49153
root     16951  0.0  0.0 112640   936 pts/0    S+   02:02   0:00 grep
--color=auto glusterfsd

[root at dhcp159-54]# rm -rf /d/testvol-1

In /var/log/messages

Jun  5 02:05:03 dhcp159-54 d-testvol-1[16906]: [2014-06-05 06:05:03.712181] M
[posix-helpers.c:1413:posix_health_check_thread_proc] 0-test-vol-posix:
health-check failed, going down
Jun  5 02:05:33 dhcp159-54 d-testvol-1[16906]: [2014-06-05 06:05:33.713652] M
[posix-helpers.c:1418:posix_health_check_thread_proc] 0-test-vol-posix: still
alive! -> SIGTERM

[root at dhcp159-54]# ps aux | grep glusterfsd
root     16917  0.0  0.9 596320 19124 ?        Ssl  02:02   0:00
/usr/local/sbin/glusterfsd -s 10.16.159.54 --volfile-id
test-vol.10.16.159.54.d-testvol-2 -p
/var/lib/glusterd/vols/test-vol/run/10.16.159.54-d-testvol-2.pid -S
/var/run/c009c5864d1d438b4e085b9af5fc2416.socket --brick-name /d/testvol-2 -l
/usr/local/var/log/glusterfs/bricks/d-testvol-2.log --xlator-option
*-posix.glusterd-uuid=df767b01-d8a1-4bba-b125-404931be1cc8 --brick-port 49153
--xlator-option test-vol-server.listen-port=49153
root     17056  0.0  0.0 112640   940 pts/0    S+   02:19   0:00 grep
--color=auto glusterfsd

[root at dhcp159-54]# gluster v status
Status of volume: test-vol
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick 10.16.159.54:/d/testvol-1                N/A    N    N/A
Brick 10.16.159.54:/d/testvol-2                49153    Y    16917
NFS Server on localhost                    2049    Y    16929

Task Status of Volume test-vol
------------------------------------------------------------------------------
There are no active volume tasks

--- Additional comment from Niels de Vos on 2014-06-05 03:38:15 EDT ---

(In reply to Lalatendu Mohanty from comment #2)
> I tried to reproduce the issue using ext4 brick partitions on master branch
> (though the bug is on 3.5, was checking if it is on master branch too)
> 1. Created the vol using ext4. Bricks are directories on a ext4 partitions
> 2.  started the volume and mounted it
> 3. deleted one of the brick , using "rm -rf <brick>"
> 4. Saw below messages in /var/log/messages and the brick process got killed.

Yes, removing the directory that holds the brick will get detected. But this is
not really the same as simulating a disk failure. You can use device-mapper to
forcefully remove devices or load an error-target that will trigger a
filesystem abort.

Or, simulating the unplugging of a device, can be done like this:

  # echo offline > /sys/block/sdb/device/state

--- Additional comment from Lalatendu Mohanty on 2014-06-09 09:35:41 EDT ---

Thanks, Niels. I could reproduce the bug with "cho "offline" >
/sys/block/sda/device/state" on a VM where sda is an IDE disk.

--- Additional comment from Anand Avati on 2014-07-01 08:58:19 EDT ---

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for
ext4 filesystem) posted (#1) for review on master by Lalatendu Mohanty
(lmohanty at redhat.com)

--- Additional comment from Anand Avati on 2014-07-01 10:44:21 EDT ---

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for
ext4 filesystem) posted (#2) for review on master by Lalatendu Mohanty
(lmohanty at redhat.com)

--- Additional comment from Anand Avati on 2014-07-01 11:02:46 EDT ---

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for
ext4 filesystem) posted (#3) for review on master by Lalatendu Mohanty
(lmohanty at redhat.com)

--- Additional comment from Anand Avati on 2014-07-05 14:38:39 EDT ---

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for
ext4 filesystem) posted (#4) for review on master by Lalatendu Mohanty
(lmohanty at redhat.com)

--- Additional comment from Anand Avati on 2014-08-14 11:18:50 EDT ---

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for
ext4 filesystem) posted (#5) for review on master by Lalatendu Mohanty
(lmohanty at redhat.com)

--- Additional comment from Anand Avati on 2014-08-14 11:40:25 EDT ---

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for
ext4 filesystem) posted (#6) for review on master by Lalatendu Mohanty
(lmohanty at redhat.com)

--- Additional comment from Anand Avati on 2014-08-21 05:08:06 EDT ---

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for
ext4 filesystem) posted (#7) for review on master by Lalatendu Mohanty
(lmohanty at redhat.com)

--- Additional comment from Anand Avati on 2014-08-21 09:15:39 EDT ---

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for
ext4 filesystem) posted (#8) for review on master by Lalatendu Mohanty
(lmohanty at redhat.com)

--- Additional comment from Anand Avati on 2014-08-21 15:25:44 EDT ---

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for
ext4 filesystem) posted (#9) for review on master by Lalatendu Mohanty
(lmohanty at redhat.com)

--- Additional comment from Anand Avati on 2014-10-28 06:40:27 EDT ---

COMMIT: http://review.gluster.org/8213 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit a7ef6eea4d43afdba9d0453c095e71e6bf22cdb7
Author: Lalatendu Mohanty <lmohanty at redhat.com>
Date:   Tue Jul 1 07:52:27 2014 -0400

    Posix: Brick failure detection fix for ext4 filesystem

    Issue: stat() on XFS has a check for the filesystem status but
    ext4 does not.

    Fix: Replacing stat() call with open, write and read  to a new file under
the
    "brick/.glusterfs" directory. This change will work for xfs, ext4 and other
    fileystems.

    Change-Id: Id03c4bc07df4ee22916a293442bd74819b051839
    BUG: 1130242
    Signed-off-by: Lalatendu Mohanty <lmohanty at redhat.com>
    Reviewed-on: http://review.gluster.org/8213
    Reviewed-by: Niels de Vos <ndevos at redhat.com>
    Tested-by: Gluster Build System <jenkins at build.gluster.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1100204
[Bug 1100204] brick failure detection does not work for ext4 filesystems
https://bugzilla.redhat.com/show_bug.cgi?id=1130242
[Bug 1130242] brick failure detection does not work for ext4 filesystems
https://bugzilla.redhat.com/show_bug.cgi?id=1150244
[Bug 1150244] glusterfsd hangs on IO when underlying ext4 filesystem
corrupts an xattr
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.