[Gluster-users] Brick offline problem

David Cunningham dcunningham at voisonics.com
Thu Aug 26 03:16:23 UTC 2021


Hello,

We have a 2 node mirrored GlusterFS cluster, and one of the bricks
(server1) has recently gone offline:

Status of volume: gvol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick server2:/nodirectwritedata/gluster/gvol0
        49153     0          Y       22015
Brick server1:/nodirectwritedata/gluster/gvol0
        N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        Y
22037
Self-heal Daemon on server1             N/A       N/A        Y       3320

This happened during the day with no action on our part to cause it.
However, glusterfsd is still running on server1. In
nodirectwritedata-gluster-gvol0.log we see lines like this before the brick
went offline:

... same as following lines back to the start of the log file...
[2021-08-25 20:07:12.002764 +0000] E [MSGID: 113002]
[posix-entry-ops.c:682:posix_mkdir] 0-gvol0-posix: gfid is null for (null)
[Invalid argument]
[2021-08-25 20:07:12.002820 +0000] E [MSGID: 115056]
[server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info
[{frame=337803516}, {MKDIR_path=},
{uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=},
{client=CTX_ID:e9b02d95-722b-49d1-b5a3-b3f1eca78ef4-GRAPH_ID:0-PID:3320-HOST:server1-PC_NAME:gvol0-client-1-RECON_NO:-0},
{error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:08:54.003409 +0000] E [MSGID: 113002]
[posix-entry-ops.c:682:posix_mkdir] 0-gvol0-posix: gfid is null for (null)
[Invalid argument]
[2021-08-25 20:08:54.003476 +0000] E [MSGID: 115056]
[server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info
[{frame=337814045}, {MKDIR_path=},
{uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=},
{client=CTX_ID:3b94ba5f-38d9-4277-9aa4-444ebe65f760-GRAPH_ID:0-PID:22037-HOST:server2-PC_NAME:gvol0-client-1-RECON_NO:-0},
{error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]

When the brick went offline it started logging this instead:

[2021-08-25 20:10:29.894516 +0000] W [dict.c:1532:dict_get_with_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/9.0/xlator/storage/posix.so(+0x37721)
[0x7ff871924721]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_int32+0x39)
[0x7ff87875d059]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_with_ref+0x7d)
[0x7ff87875ca7d] ) 0-dict: dict OR key (readdir-filter-directories) is NULL
[Invalid argument]
[2021-08-25 20:10:30.346692 +0000] W [dict.c:1532:dict_get_with_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/9.0/xlator/storage/posix.so(+0x37721)
[0x7ff871924721]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_int32+0x39)
[0x7ff87875d059]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_with_ref+0x7d)
[0x7ff87875ca7d] ) 0-dict: dict OR key (readdir-filter-directories) is NULL
[Invalid argument]
... repeated...

In glustershd.log we have logging as below. Would anyone have a suggestion
of what could be wrong? GlusterFS is version 9.0 running on Ubuntu 18.04 (I
notice the logging below mentions "Program-name=GlusterFS 4.x v1" which is
strange).
Thank you in advance!

[2021-08-25 20:10:25.741098 +0000] W [socket.c:767:__socket_rwv]
0-gvol0-client-0: readv on 192.168.0.201:49152 failed (No data available)
[2021-08-25 20:10:25.741151 +0000] I [MSGID: 114018]
[client.c:2229:client_rpc_notify] 0-gvol0-client-0: disconnected from
client, process will keep trying to connect glusterd until brick's port is
available [{conn-name=gvol0-client-0}]
[2021-08-25 20:10:28.741971 +0000] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
0-gvol0-client-0: changing port to 49153 (from 0)
[2021-08-25 20:10:28.742543 +0000] I [MSGID: 114057]
[client-handshake.c:1128:select_server_supported_programs]
0-gvol0-client-0: Using Program [{Program-name=GlusterFS 4.x v1},
{Num=1298437}, {Version=400}]
[2021-08-25 20:10:28.743059 +0000] I [MSGID: 114046]
[client-handshake.c:857:client_setvolume_cbk] 0-gvol0-client-0: Connected,
attached to remote volume [{conn-name=gvol0-client-0},
{remote_subvol=/nodirectwritedata/gluster/gvol0}]
[2021-08-25 20:10:28.746963 +0000] I [MSGID: 108026]
[afr-self-heal-data.c:347:afr_selfheal_data_do] 0-gvol0-replicate-0:
performing data selfheal on 0bd79a48-73e1-4c87-9c75-fc18dc158775
[2021-08-25 20:10:28.752371 +0000] I [MSGID: 108026]
[afr-self-heal-common.c:1745:afr_log_selfheal] 0-gvol0-replicate-0:
Completed data selfheal on 0bd79a48-73e1-4c87-9c75-fc18dc158775.
sources=[1]  sinks=0
[2021-08-25 20:10:28.754305 +0000] I [MSGID: 108026]
[afr-self-heal-data.c:347:afr_selfheal_data_do] 0-gvol0-replicate-0:
performing data selfheal on b8295e57-c99a-4a20-8504-fb7a2e8fb7f2
[2021-08-25 20:10:28.761193 +0000] I [MSGID: 108026]
[afr-self-heal-common.c:1745:afr_log_selfheal] 0-gvol0-replicate-0:
Completed data selfheal on b8295e57-c99a-4a20-8504-fb7a2e8fb7f2.
sources=[1]  sinks=0
... repeated many times and then...
 [2021-08-25 20:10:44.803924 +0000] E [MSGID: 114031]
[client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-0: remote
operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:10:44.803984 +0000] E [MSGID: 114031]
[client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-1: remote
operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:20:45.132601 +0000] E [MSGID: 114031]
[client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-0: remote
operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
... repeated...

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210826/64b368e1/attachment.html>


More information about the Gluster-users mailing list