[Gluster-users] Brick offline problem

David Cunningham dcunningham at voisonics.com
Fri Aug 27 04:00:52 UTC 2021


Hello,

The same has happened again, with the following broadcast. Would anyone
have any ideas what's going on? Thanks in advance.

Aug 26 22:12:12 server1 nodirectwritedata-gluster-gvol0[7068]: [2021-08-27
02:12:12.120501 +0000] M [MSGID: 113075]
[posix-helpers.c:2211:posix_health_check_thread_proc] 0-gvol0-posix:
health-check failed, going down
Aug 26 22:12:12 server1 nodirectwritedata-gluster-gvol0[7068]: [2021-08-27
02:12:12.120597 +0000] M [MSGID: 113075]
[posix-helpers.c:2229:posix_health_check_thread_proc] 0-gvol0-posix: still
alive! -> SIGTERM


On Thu, 26 Aug 2021 at 15:16, David Cunningham <dcunningham at voisonics.com>
wrote:

> Hello,
>
> We have a 2 node mirrored GlusterFS cluster, and one of the bricks
> (server1) has recently gone offline:
>
> Status of volume: gvol0
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
> ------------------------------------------------------------------------------
> Brick server2:/nodirectwritedata/gluster/gvol0
>         49153     0          Y       22015
> Brick server1:/nodirectwritedata/gluster/gvol0
>         N/A       N/A        N       N/A
> Self-heal Daemon on localhost               N/A       N/A        Y
> 22037
> Self-heal Daemon on server1             N/A       N/A        Y       3320
>
> This happened during the day with no action on our part to cause it.
> However, glusterfsd is still running on server1. In
> nodirectwritedata-gluster-gvol0.log we see lines like this before the brick
> went offline:
>
> ... same as following lines back to the start of the log file...
> [2021-08-25 20:07:12.002764 +0000] E [MSGID: 113002]
> [posix-entry-ops.c:682:posix_mkdir] 0-gvol0-posix: gfid is null for (null)
> [Invalid argument]
> [2021-08-25 20:07:12.002820 +0000] E [MSGID: 115056]
> [server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info
> [{frame=337803516}, {MKDIR_path=},
> {uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=},
> {client=CTX_ID:e9b02d95-722b-49d1-b5a3-b3f1eca78ef4-GRAPH_ID:0-PID:3320-HOST:server1-PC_NAME:gvol0-client-1-RECON_NO:-0},
> {error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]
> [2021-08-25 20:08:54.003409 +0000] E [MSGID: 113002]
> [posix-entry-ops.c:682:posix_mkdir] 0-gvol0-posix: gfid is null for (null)
> [Invalid argument]
> [2021-08-25 20:08:54.003476 +0000] E [MSGID: 115056]
> [server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info
> [{frame=337814045}, {MKDIR_path=},
> {uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=},
> {client=CTX_ID:3b94ba5f-38d9-4277-9aa4-444ebe65f760-GRAPH_ID:0-PID:22037-HOST:server2-PC_NAME:gvol0-client-1-RECON_NO:-0},
> {error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]
>
> When the brick went offline it started logging this instead:
>
> [2021-08-25 20:10:29.894516 +0000] W [dict.c:1532:dict_get_with_ref]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.0/xlator/storage/posix.so(+0x37721)
> [0x7ff871924721]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_int32+0x39)
> [0x7ff87875d059]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_with_ref+0x7d)
> [0x7ff87875ca7d] ) 0-dict: dict OR key (readdir-filter-directories) is NULL
> [Invalid argument]
> [2021-08-25 20:10:30.346692 +0000] W [dict.c:1532:dict_get_with_ref]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.0/xlator/storage/posix.so(+0x37721)
> [0x7ff871924721]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_int32+0x39)
> [0x7ff87875d059]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_with_ref+0x7d)
> [0x7ff87875ca7d] ) 0-dict: dict OR key (readdir-filter-directories) is NULL
> [Invalid argument]
> ... repeated...
>
> In glustershd.log we have logging as below. Would anyone have a suggestion
> of what could be wrong? GlusterFS is version 9.0 running on Ubuntu 18.04 (I
> notice the logging below mentions "Program-name=GlusterFS 4.x v1" which is
> strange).
> Thank you in advance!
>
> [2021-08-25 20:10:25.741098 +0000] W [socket.c:767:__socket_rwv]
> 0-gvol0-client-0: readv on 192.168.0.201:49152 failed (No data available)
> [2021-08-25 20:10:25.741151 +0000] I [MSGID: 114018]
> [client.c:2229:client_rpc_notify] 0-gvol0-client-0: disconnected from
> client, process will keep trying to connect glusterd until brick's port is
> available [{conn-name=gvol0-client-0}]
> [2021-08-25 20:10:28.741971 +0000] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
> 0-gvol0-client-0: changing port to 49153 (from 0)
> [2021-08-25 20:10:28.742543 +0000] I [MSGID: 114057]
> [client-handshake.c:1128:select_server_supported_programs]
> 0-gvol0-client-0: Using Program [{Program-name=GlusterFS 4.x v1},
> {Num=1298437}, {Version=400}]
> [2021-08-25 20:10:28.743059 +0000] I [MSGID: 114046]
> [client-handshake.c:857:client_setvolume_cbk] 0-gvol0-client-0: Connected,
> attached to remote volume [{conn-name=gvol0-client-0},
> {remote_subvol=/nodirectwritedata/gluster/gvol0}]
> [2021-08-25 20:10:28.746963 +0000] I [MSGID: 108026]
> [afr-self-heal-data.c:347:afr_selfheal_data_do] 0-gvol0-replicate-0:
> performing data selfheal on 0bd79a48-73e1-4c87-9c75-fc18dc158775
> [2021-08-25 20:10:28.752371 +0000] I [MSGID: 108026]
> [afr-self-heal-common.c:1745:afr_log_selfheal] 0-gvol0-replicate-0:
> Completed data selfheal on 0bd79a48-73e1-4c87-9c75-fc18dc158775.
> sources=[1]  sinks=0
> [2021-08-25 20:10:28.754305 +0000] I [MSGID: 108026]
> [afr-self-heal-data.c:347:afr_selfheal_data_do] 0-gvol0-replicate-0:
> performing data selfheal on b8295e57-c99a-4a20-8504-fb7a2e8fb7f2
> [2021-08-25 20:10:28.761193 +0000] I [MSGID: 108026]
> [afr-self-heal-common.c:1745:afr_log_selfheal] 0-gvol0-replicate-0:
> Completed data selfheal on b8295e57-c99a-4a20-8504-fb7a2e8fb7f2.
> sources=[1]  sinks=0
> ... repeated many times and then...
>  [2021-08-25 20:10:44.803924 +0000] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-0: remote
> operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
> [2021-08-25 20:10:44.803984 +0000] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-1: remote
> operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
> [2021-08-25 20:20:45.132601 +0000] E [MSGID: 114031]
> [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-0: remote
> operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
> ... repeated...
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>


-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210827/48a55d80/attachment.html>


More information about the Gluster-users mailing list