[Gluster-users] Self Heal fails...

Fri Sep 16 18:48:35 UTC 2011

I'm using GlusterFS version 3.2.3 built from the sources of the
gluster.org website.

I think I've found a way. I've shutdown my volume, detached the peers
and basically recreated my storage volume from scratch.
This time I started the setup with probing a peer from the node that had
the up to date data in its underlying storage directory.

Then I created the Volume again from scratch, this time entering
node2:/export first and then node1:/export.
Then I mounted the Gluster Volume locally  and am currently running the
find one liner on it.
Judging from the logs, it seems to be rebuilding.

I'm just wondering if there is perhaps a more elegant way to force a resync.
It would be nice if there was a feature or a command, so that you can
say: ok Node2, you are the main source, node1 listen to what node2 has
to say.

On 09/16/2011 08:31 PM, Burnash, James wrote:
> Hi Robert.
>
> Can you tell us what version you are running? That helps nail down if this is a known bug in a specific version.
>
> James Burnash
> Unix Engineer
> Knight Capital Group
>
>
> -----Original Message-----
> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Robert Krig
> Sent: Friday, September 16, 2011 2:17 PM
> To: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Self Heal fails...
>
>
> On 09/16/2011 06:36 PM, Robert Krig wrote:
>> Hi there. I'm new to GlusterFS. I'm currently evaluating it for 
>> production usage.
>>
>> I have two Storage Servers which use JFS as a filesystem for the 
>> underlying export.
>>
>> The setup is supposed to be replicated.
>>
>> I've been experimenting with various settings for benchmarking and 
>> such, as well as trying out different failure scenarios.
>>
>> Anyways, the export directory on node 1 is out of sync with node 2.
>> So I mounted the storage volume via glusterfs client on node1 in 
>> another directory.
>>
>> The fuse mounted directory is /storage
>>
>> As per Manual I tried doing the "find <gluster-mount> -noleaf -print0 
>> | xargs --null stat >/dev/null" dance, however the logs throw a bunch 
>> of
>> errors:
>> ######################################################################
>> ###
>> [2011-09-16 18:29:33.759729] E
>> [client3_1-fops.c:1216:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: 
>> error
>> [2011-09-16 18:29:33.759747] I
>> [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0:
>> remote operation failed: Invalid argument
>> [2011-09-16 18:29:33.759942] E
>> [afr-self-heal-metadata.c:672:afr_sh_metadata_post_nonblocking_inodelk
>> _cbk]
>> 0-GLSTORAGE-replicate-0: Non Blocking metadata inodelks failed for /.
>> [2011-09-16 18:29:33.759961] E
>> [afr-self-heal-metadata.c:674:afr_sh_metadata_post_nonblocking_inodelk
>> _cbk]
>> 0-GLSTORAGE-replicate-0: Metadata self-heal failed for /.
>> [2011-09-16 18:29:33.760167] W [rpc-common.c:64:xdr_to_generic]
>> (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x7d) 
>> [0x7f4702a751ad]
>> (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>> [0x7f4702a74de5]
>> (-->/usr/local/lib/glusterfs/3.2.3/xlator/protocol/client.so(client3_1
>> _entrylk_cbk+0x52)
>> [0x7f46ff88a572]))) 0-xdr: XDR decoding failed
>> [2011-09-16 18:29:33.760200] E
>> [client3_1-fops.c:1292:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: 
>> error
>> [2011-09-16 18:29:33.760215] I
>> [client3_1-fops.c:1303:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0:
>> remote operation failed: Invalid argument
>> [2011-09-16 18:29:33.760417] E
>> [afr-self-heal-entry.c:2292:afr_sh_post_nonblocking_entry_cbk]
>> 0-GLSTORAGE-replicate-0: Non Blocking entrylks failed for /.
>> [2011-09-16 18:29:33.760447] E
>> [afr-self-heal-common.c:1554:afr_self_heal_completion_cbk]
>> 0-GLSTORAGE-replicate-0: background  meta-data entry self-heal failed 
>> on /
>> [2011-09-16 18:29:33.760808] I
>> [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0:
>> remote operation failed: Invalid argument 
>> ######################################################################
>> #############
>>
>>
>> Is this normal? The directory in question already has 150GB of data, 
>> so the find command is still running. Will it be ok once it finishes?
>> from what I understand from the manual, the files should repair as the 
>> find process runs, or did I misinterpret that?
>>
>> If self heal should fail, is there a failsafe method to ensure that 
>> both nodes are in sync again?
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>
> Well, the find process has finished in the meantime, and as expected, it didn't fix anything.
>
> here are the last few lines of the client mount log:
> ######################################################################################
> 2011-09-16 18:48:45.287954] I
> [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0:
> remote operation failed: Invalid argument
> [2011-09-16 18:48:45.288394] I
> [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0:
> remote operation failed: Invalid argument
> [2011-09-16 18:48:45.288921] I
> [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0:
> remote operation failed: Invalid argument
> [2011-09-16 18:48:45.289535] I
> [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0:
> remote operation failed: Invalid argument
> [2011-09-16 18:48:45.290063] I
> [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0:
> remote operation failed: Invalid argument
> [2011-09-16 18:48:45.290649] I
> [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0:
> remote operation failed: Invalid argument
> [2011-09-16 18:48:45.291126] I
> [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0:
> remote operation failed: Invalid argument
> [2011-09-16 20:14:52.289901] W [rpc-common.c:64:xdr_to_generic]
> (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x7d) [0x7f4702a751ad]
> (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
> [0x7f4702a74de5]
> (-->/usr/local/lib/glusterfs/3.2.3/xlator/protocol/client.so(client3_1_statfs_cbk+0x71)
> [0x7f46ff88b741]))) 0-xdr: XDR decoding failed
> [2011-09-16 20:14:52.289928] E
> [client3_1-fops.c:624:client3_1_statfs_cbk] 0-GLSTORAGE-client-0: error
> [2011-09-16 20:14:52.289939] I
> [client3_1-fops.c:637:client3_1_statfs_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument #######################################################################################
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
> DISCLAIMER: 
> This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. 
> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com