[Gluster-users] Self Heal Issue GlusterFS 3.3.1

Bobby Jacob bobby.jacob at alshaya.com
Tue Dec 10 07:42:57 UTC 2013


Hi,

Thanks Joe, the split brain files have been removed as you recommended. How can we deal with this situation as there is no document which solves such issues. ?

[root at KWTOCUATGS001 83]# gluster volume heal glustervol info
Gathering Heal info on volume glustervol has been successful

Brick KWTOCUATGS001:/mnt/cloudbrick
Number of entries: 14
/Tommy Kolega
<gfid:10429dd5-180c-432e-aa4a-8b1624b86f4b>
<gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7>
<gfid:3e3d77d6-2818-4766-ae3b-4f582118321b>
<gfid:8bd03482-025c-4c09-8704-60be9ddfdfd8>
<gfid:2685e11a-4eb9-4a92-883e-faa50edfa172>
<gfid:24d83cbd-e621-4330-b0c1-ae1f0fd2580d>
<gfid:197e50fa-bfc0-4651-acaa-1f3d2d73936f>
<gfid:3e094ee9-c9cf-4010-82f4-6d18c1ab9ca0>
<gfid:77783245-4e03-4baf-8cb4-928a57b266cb>
<gfid:70340eaa-7967-41d0-855f-36add745f16f>
<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>
<gfid:b1651457-175a-43ec-b476-d91ae8b52b0b>
/Tommy Kolega/lucene_index

Brick KWTOCUATGS002:/mnt/cloudbrick
Number of entries: 15
<gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7>
<gfid:0454d0d2-d432-4ac8-8476-02a8522e4a6a>
<gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6>
<gfid:00389876-700f-4351-b00e-1c57496eed89>
<gfid:0cd48d89-1dd2-47f6-9311-58224b19446e>
<gfid:081c6657-301a-42a4-9f95-6eeba6c67413>
<gfid:565f1358-449c-45e2-8535-93b5632c0d1e>
<gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e>
<gfid:25fd406f-63e0-4037-bb01-da282cbe4d76>
<gfid:a109c429-5885-499e-8711-09fdccd396f2>
<gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6>
/Tommy Kolega
/Tommy Kolega/lucene_index
<gfid:c49e9d76-e5d4-47dc-9cf1-3f858f6d07ea>
<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>

Thanks & Regards,
Bobby Jacob

-----Original Message-----
From: Joe Julian [mailto:joe at julianfamily.org] 
Sent: Tuesday, December 10, 2013 7:59 AM
To: Bobby Jacob
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Self Heal Issue GlusterFS 3.3.1

On Tue, 2013-12-03 at 05:47 +0000, Bobby Jacob wrote:
> Hi,
> 
>  
> 
> I’m running glusterFS 3.3.1 on Centos 6.4. 
> 
> Ø Gluster volume status
> 
>  
> 
> Status of volume: glustervol
> 
> Gluster process                                         Port    Online
> Pid
> 
> ----------------------------------------------------------------------
> --------
> 
> Brick KWTOCUATGS001:/mnt/cloudbrick                     24009   Y
> 20031
> 
> Brick KWTOCUATGS002:/mnt/cloudbrick                     24009   Y
> 1260
> 
> NFS Server on localhost
>                       38467   Y       43320
> 
> Self-heal Daemon on localhost                                    N/A
> Y       43326
> 
> NFS Server on KWTOCUATGS002                             38467   Y
> 5842
> 
> Self-heal Daemon on KWTOCUATGS002                       N/A     Y
> 5848
> 
>  
> 
> The self heal stops working and application write only to 1 brick and 
> it doesn’t replicate. When I check /var/log/glusterfs/glustershd.log I 
> see the following.:
> 
>  
> 
> [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive]
> 0-socket: failed to set keep idle on socket 8
> 
> [2013-12-03 05:42:32.033646] W
> [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd:
> Failed to set keep-alive: Operation not supported
> 
> [2013-12-03 05:42:32.790473] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), 
> Version (330)
> 
> [2013-12-03 05:42:32.790840] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1:
> Connected to 172.16.95.153:24009, attached to remote volume 
> '/mnt/cloudbrick'.
> 
> [2013-12-03 05:42:32.790884] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1:
> Server and Client lk-version numbers are not same, reopening the fds
> 
> [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify]
> 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back 
> up; going online.
> 
> [2013-12-03 05:42:32.791161] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-1: Server lk version = 1
> 
> [2013-12-03 05:42:32.795103] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.798064] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.799278] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.800636] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.802223] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.803339] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.804308] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.804877] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), 
> Version (330)
> 
> [2013-12-03 05:42:32.807517] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0:
> Connected to 172.16.107.154:24009, attached to remote volume 
> '/mnt/cloudbrick'.
> 
> [2013-12-03 05:42:32.807562] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0:
> Server and Client lk-version numbers are not same, reopening the fds
> 
> [2013-12-03 05:42:32.810357] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-0: Server lk version = 1
> 
> [2013-12-03 05:42:32.827437] E
> [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done]
> 0-glustervol-replicate-0: Unable to self-heal contents of 
> '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain).
> Please delete the file from all but the preferred subvolume.

That file is at
$brick/.glusterfs/12/62/1262d40d-46a3-4e57-b07b-0fcc972c8403

Try picking one to remove like it says.
> 
> [2013-12-03 05:42:39.205157] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of 
> '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain).
> Please fix the file on all backend volumes
> 
> [2013-12-03 05:42:39.215793] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of 
> '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain).
> Please fix the file on all backend volumes
> 
>  
If that doesn't allow it to heal, you may need to find which filename that's hardlinked to. ls -li the gfid file at the path I demonstrated earlier. With that inode number in hand, find $brick -inum $inode_number Once you know which filenames it's linked with, remove all linked copies from all but one replica. Then the self-heal can continue successfully.




More information about the Gluster-users mailing list