[Gluster-users] How to prevent Brick terminated by socket temporarily unavailable

Thu May 16 20:50:16 UTC 2019

I'm having a frequent problem where some temporary condition causes bricks to be shut down. The health-check feature is shutting them down, and according to https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/brick-failure-detection/ the brick will stay off and not be restarted (by design).

What I don't understand is:
What is causing this "Resource temporarily unavailable" in the first place. From searching the web, it sounds like a socket timeout. Have you guys seen this before?
If this is truly a temporary failure, why do we shut down the brick indefinitely?

Should I try any of the following:
Increase 'network.ping-timeout' or 'client.grace-timeout'
Disable the health check feature by setting:
# gluster volume set <VOLNAME> storage.health-check-interval 0

The brick log looks like this at the time it is shut down:

------------------

[2019-05-08 13:48:33.642605] W [MSGID: 113075] [posix-helpers.c:1895:posix_fs_health_check] 0-heketidbstorage-posix: aio_write() on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick/.glusterfs/health_check returned [Resource temporarily unavailable]

[2019-05-08 13:48:33.749246] M [MSGID: 113075] [posix-helpers.c:1962:posix_health_check_thread_proc] 0-heketidbstorage-posix: health-check failed, going down

[2019-05-08 13:48:34.000428] M [MSGID: 113075] [posix-helpers.c:1981:posix_health_check_thread_proc] 0-heketidbstorage-posix: still alive! -> SIGTERM

[2019-05-08 13:49:04.597061] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f16fdd94dd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556e53da2d65] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556e53da2b8b] ) 0-: received signum (15), shutting down

------------------

The GlusterD log shows this shortly after:

------------------
[2019-05-08 13:49:04.673536] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick on port
 49152
[2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available)
------------------

Any guidance would be greatly appreciated!

Best,

Jeff Bischoff

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/a433ed15/attachment.html>