[Gluster-users] brick process crashes on "Structure needs cleaning"

Olaf Buitelaar olaf.buitelaar at gmail.com
Mon Feb 22 11:52:54 UTC 2021


Dear Users,

Somehow the brick processes seem to crash on xfs filesystem error's. It
seems it depends on the way the gluster process is started. Also gluster
sends on this occurrence a message to the console, informing the process
will go down, however it doesn't really seem to go down;

M [MSGID: 113075] [posix-helpers.c:2185:posix_health_check_thread_proc]
0-ovirt-engine-posix: health-check failed, going down
 M [MSGID: 113075] [posix-helpers.c:2203:posix_health_check_thread_proc]
0-ovirt-engine-posix: still alive! -> SIGTERM

in the brick log a message like this is logged;
[posix-helpers.c:2111:posix_fs_health_check] 0-ovirt-data-posix:
aio_read_cmp_buf() on
/data5/gfs/bricks/brick1/ovirt-data/.glusterfs/health_check returned ret is
-1 error is Structure needs cleaning

or like this;
 W [MSGID: 113075] [posix-helpers.c:2111:posix_fs_health_check]
0-ovirt-mon-2-posix: aio_read_buf() on
/data0/gfs/bricks/bricka/ovirt-mon-2/.glusterfs/health_check returned ret
is -1 error is Success

when i check the actual file it just seems to contain a timestamp;
cat /data0/gfs/bricks/bricka/ovirt-mon-2/.glusterfs/health_check
2021-01-28 09:08:01⏎

And don't see errors in DMESG about having issues accessing it.

When i unmount the filesystem and run xfs_repair on it, no error's/issues
are reported. Also when i mount the filesystem again, it's reported as a
clean mount;
[2478552.169540] XFS (dm-23): Mounting V5 Filesystem
[2478552.180645] XFS (dm-23): Ending clean mount

When i kill the brick process and start with "gluser v start x force" the
issue seems much more unlikely to occur, but when started from a fresh
reboot, or when killing the process and let it being started by glusterd
(e.g. service glusterd start) the error seems to arise after a couple of
minutes.

I am making use of LVM cache (in write through mode), maybe that's related.
Also the disks it self are backed by a hardware raid controller and i did
inspect all disks for SMART errors.

Does anybody has experience with this, and a clue on what might causing
this?

Thanks Olaf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210222/30a691d2/attachment.html>


More information about the Gluster-users mailing list