[Gluster-users] lingering <gfid:*> entries in volume heal, gluster 3.6.3
Ravishankar N
ravishankar at redhat.com
Fri Jul 15 11:38:35 UTC 2016
Can you check the getfattr output of a few of those 129 entries from all
bricks? You basically need to see if there are non zero afr-xattrs for
the files in question which would indicate a pending heal.
-Ravi
On 07/08/2016 03:12 PM, Kingsley wrote:
> Further to this, I've noticed something which might have been a bit of a
> red herring in my previous post.
>
> We have 3 volumes - gv0, voicemail and callrec. callrec is the only one
> showing self heal entries, yet all of the "No such file or directory"
> errors in glustershd.log appear to refer to gv0. gv0 has no self heal
> entries shown by "gluster volume heal gv0 info", and no split brain
> entries either.
>
> If I de-dupe those log entries, I just get these:
>
> [root at gluster1a-1 glusterfs]# grep gfid: glustershd.log | awk -F\] '{print $3}' | sort | uniq
> 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
> 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537)
> 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0)
> 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:b1e273ad-9eb1-4f97-a41c-39eecb149bd6> (b1e273ad-9eb1-4f97-a41c-39eecb149bd6)
> 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
> 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537)
> 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0)
> 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
> 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0)
>
>
> There doesn't seem anything obvious to me in glustershd.log about the
> callrec volume. On one of the bricks that stayed up:
>
> [root at gluster1a-1 glusterfs]# grep callrec glustershd.log
> [2016-07-08 08:54:03.424446] I [graph.c:269:gf_add_cmdline_options] 0-callrec-replicate-0: adding option 'node-uuid' for volume 'callrec-replicate-0' with value 'b9d3b1a2-3214-41ba-a1c9-9c7d4b18ff5d'
> [2016-07-08 08:54:03.429663] I [client.c:2280:notify] 0-callrec-client-0: parent translators are ready, attempting connect on transport
> [2016-07-08 08:54:03.432198] I [client.c:2280:notify] 0-callrec-client-1: parent translators are ready, attempting connect on transport
> [2016-07-08 08:54:03.434375] I [client.c:2280:notify] 0-callrec-client-2: parent translators are ready, attempting connect on transport
> [2016-07-08 08:54:03.436521] I [client.c:2280:notify] 0-callrec-client-3: parent translators are ready, attempting connect on transport
> 1: volume callrec-client-0
> 5: option remote-subvolume /data/brick/callrec
> 11: volume callrec-client-1
> 15: option remote-subvolume /data/brick/callrec
> 21: volume callrec-client-2
> 25: option remote-subvolume /data/brick/callrec
> 31: volume callrec-client-3
> 35: option remote-subvolume /data/brick/callrec
> 41: volume callrec-replicate-0
> 50: subvolumes callrec-client-0 callrec-client-1 callrec-client-2 callrec-client-3
> 159: subvolumes callrec-replicate-0 gv0-replicate-0 voicemail-replicate-0
> [2016-07-08 08:54:03.458708] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-0: changing port to 49153 (from 0)
> [2016-07-08 08:54:03.465684] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-07-08 08:54:03.465921] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-0: Connected to callrec-client-0, attached to remote volume '/data/brick/callrec'.
> [2016-07-08 08:54:03.465927] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-0: Server and Client lk-version numbers are not same, reopening the fds
> [2016-07-08 08:54:03.465967] I [MSGID: 108005] [afr-common.c:3669:afr_notify] 0-callrec-replicate-0: Subvolume 'callrec-client-0' came back up; going online.
> [2016-07-08 08:54:03.466108] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-0: Server lk version = 1
> [2016-07-08 08:54:04.266979] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-1: changing port to 49153 (from 0)
> [2016-07-08 08:54:04.732625] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-2: changing port to 49153 (from 0)
> [2016-07-08 08:54:04.738533] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-07-08 08:54:04.738911] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-2: Connected to callrec-client-2, attached to remote volume '/data/brick/callrec'.
> [2016-07-08 08:54:04.738921] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-2: Server and Client lk-version numbers are not same, reopening the fds
> [2016-07-08 08:54:04.739181] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-2: Server lk version = 1
> [2016-07-08 08:54:05.271388] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-07-08 08:54:05.271858] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-1: Connected to callrec-client-1, attached to remote volume '/data/brick/callrec'.
> [2016-07-08 08:54:05.271879] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-1: Server and Client lk-version numbers are not same, reopening the fds
> [2016-07-08 08:54:05.272185] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-1: Server lk version = 1
> [2016-07-08 08:54:06.302301] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-3: changing port to 49153 (from 0)
> [2016-07-08 08:54:06.305473] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-07-08 08:54:06.305915] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-3: Connected to callrec-client-3, attached to remote volume '/data/brick/callrec'.
> [2016-07-08 08:54:06.305925] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-3: Server and Client lk-version numbers are not same, reopening the fds
> [2016-07-08 08:54:06.306307] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-3: Server lk version = 1
>
>
> And on the brick that went offline for a few days:
>
> [root at gluster2a-1 glusterfs]# grep callrec glustershd.log
> [2016-07-08 08:54:06.900964] I [graph.c:269:gf_add_cmdline_options] 0-callrec-replicate-0: adding option 'node-uuid' for volume 'callrec-replicate-0' with value 'e96ae8cd-f38f-4c2a-bb3b-baeb78f88f13'
> [2016-07-08 08:54:06.906449] I [client.c:2280:notify] 0-callrec-client-0: parent translators are ready, attempting connect on transport
> [2016-07-08 08:54:06.908851] I [client.c:2280:notify] 0-callrec-client-1: parent translators are ready, attempting connect on transport
> [2016-07-08 08:54:06.911045] I [client.c:2280:notify] 0-callrec-client-2: parent translators are ready, attempting connect on transport
> [2016-07-08 08:54:06.913528] I [client.c:2280:notify] 0-callrec-client-3: parent translators are ready, attempting connect on transport
> 1: volume callrec-client-0
> 5: option remote-subvolume /data/brick/callrec
> 11: volume callrec-client-1
> 15: option remote-subvolume /data/brick/callrec
> 21: volume callrec-client-2
> 25: option remote-subvolume /data/brick/callrec
> 31: volume callrec-client-3
> 35: option remote-subvolume /data/brick/callrec
> 41: volume callrec-replicate-0
> 50: subvolumes callrec-client-0 callrec-client-1 callrec-client-2 callrec-client-3
> 159: subvolumes callrec-replicate-0 gv0-replicate-0 voicemail-replicate-0
> [2016-07-08 08:54:06.938769] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-2: changing port to 49153 (from 0)
> [2016-07-08 08:54:06.948204] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-1: changing port to 49153 (from 0)
> [2016-07-08 08:54:06.951625] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-07-08 08:54:06.951849] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-2: Connected to callrec-client-2, attached to remote volume '/data/brick/callrec'.
> [2016-07-08 08:54:06.951858] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-2: Server and Client lk-version numbers are not same, reopening the fds
> [2016-07-08 08:54:06.951906] I [MSGID: 108005] [afr-common.c:3669:afr_notify] 0-callrec-replicate-0: Subvolume 'callrec-client-2' came back up; going online.
> [2016-07-08 08:54:06.951938] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-2: Server lk version = 1
> [2016-07-08 08:54:07.152217] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-3: changing port to 49153 (from 0)
> [2016-07-08 08:54:07.167137] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-07-08 08:54:07.167474] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-1: Connected to callrec-client-1, attached to remote volume '/data/brick/callrec'.
> [2016-07-08 08:54:07.167483] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-1: Server and Client lk-version numbers are not same, reopening the fds
> [2016-07-08 08:54:07.167664] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-1: Server lk version = 1
> [2016-07-08 08:54:07.240249] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-0: changing port to 49153 (from 0)
> [2016-07-08 08:54:07.243156] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-07-08 08:54:07.243512] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-0: Connected to callrec-client-0, attached to remote volume '/data/brick/callrec'.
> [2016-07-08 08:54:07.243520] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-0: Server and Client lk-version numbers are not same, reopening the fds
> [2016-07-08 08:54:07.243804] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-0: Server lk version = 1
> [2016-07-08 08:54:07.400188] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-07-08 08:54:07.400574] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-3: Connected to callrec-client-3, attached to remote volume '/data/brick/callrec'.
> [2016-07-08 08:54:07.400583] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-3: Server and Client lk-version numbers are not same, reopening the fds
> [2016-07-08 08:54:07.400802] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-3: Server lk version = 1
>
> Cheers,
> Kingsley.
>
> On Fri, 2016-07-08 at 10:08 +0100, Kingsley wrote:
>> Hi,
>>
>> One of our bricks was offline for a few days when it didn't reboot after
>> a yum update (the gluster version wasn't changed). The volume heal info
>> is showing the same 129 entries, all of the format
>> <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> on the 3 bricks that
>> remained up, and no entries on the brick that was offline.
>>
>> glustershd.log on the brick that was offline has stuff like this in it:
>>
>> [2016-07-08 08:54:07.411486] I [client-handshake.c:1200:client_setvolume_cbk] 0-gv0-client-1: Connected to gv0-client-1, attached to remote volume '/data/brick/gv0'.
>> [2016-07-08 08:54:07.411493] I [client-handshake.c:1210:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds
>> [2016-07-08 08:54:07.411678] I [client-handshake.c:188:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1
>> [2016-07-08 08:54:07.793661] I [client-handshake.c:1200:client_setvolume_cbk] 0-gv0-client-3: Connected to gv0-client-3, attached to remote volume '/data/brick/gv0'.
>> [2016-07-08 08:54:07.793688] I [client-handshake.c:1210:client_setvolume_cbk] 0-gv0-client-3: Server and Client lk-version numbers are not same, reopening the fds
>> [2016-07-08 08:54:07.794091] I [client-handshake.c:188:client_set_lk_version_cbk] 0-gv0-client-3: Server lk version = 1
>>
>> but glustershd.log on the other 3 bricks has many lines looking like
>> this:
>>
>> [2016-07-08 09:05:17.203017] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0)
>> [2016-07-08 09:05:17.203405] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:b1e273ad-9eb1-4f97-a41c-39eecb149bd6> (b1e273ad-9eb1-4f97-a41c-39eecb149bd6)
>> [2016-07-08 09:05:17.204035] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537)
>> [2016-07-08 09:05:17.204225] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537)
>> [2016-07-08 09:05:17.204651] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
>> [2016-07-08 09:05:17.204879] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
>> [2016-07-08 09:05:17.205042] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
>>
>> How do I fix this? I need to update the other bricks but am reluctant to
>> do so until the volume is in good shape first.
>>
>> We're running Gluster 3.6.3 on CentOS 7. Volume info:
>>
>> Volume Name: callrec
>> Type: Replicate
>> Volume ID: a39830b7-eddb-4061-b381-39411274131a
>> Status: Started
>> Number of Bricks: 1 x 4 = 4
>> Transport-type: tcp
>> Bricks:
>> Brick1: gluster1a-1:/data/brick/callrec
>> Brick2: gluster1b-1:/data/brick/callrec
>> Brick3: gluster2a-1:/data/brick/callrec
>> Brick4: gluster2b-1:/data/brick/callrec
>> Options Reconfigured:
>> performance.flush-behind: off
>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list