[Gluster-devel] [Gluster-users] Self Heal and dangling symlinks

Alexandre Fournier alexandre.fournier at ubisoft.com
Fri Nov 22 21:02:16 UTC 2013


Ok, yes, I though this could have an impact when we are writing on the gluster mount.

However, we still have strange error while writing on the gluster mount.  

We have  the following problem :
- Cannot write on the gluster mount because of an input / output error (apparently from a file type differ) which are sporadic
- The log which does not stop filling the log (70 gig)
- Dangling links which comes back even when you remove it. 

I don't know if those problem are linked together but our main issue is the write failure.  We can't identified the reason why.

Do you need more information about those problems?

Thanks for your help!

Alex

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Joe Julian
Sent: 21 novembre 2013 17:50
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] [Gluster-devel] Self Heal and dangling symlinks

That looks like you haven't had a split-brain since the 9th of October...

On 11/21/2013 02:43 PM, Alexandre Fournier wrote:
> I would like also to inform you about the information about the split-brain we have :
>
> 2013-10-09 22:02:59 <gfid:83b39e48-f8eb-4149-b851-d3b97e18c4b6>
> 2013-10-09 21:52:59 <gfid:85378e3f-0dd1-4f8e-a7d5-70424b643fb9>
> 2013-10-09 21:52:59 <gfid:0a958cad-1615-4e1b-8e1a-9dc0356859d6>
> 2013-10-09 21:52:59 <gfid:54b02fde-69d2-4da2-8372-2a7af89a0ae1>
> 2013-10-09 21:52:59 <gfid:4702c3ab-a2bb-43e3-ae2c-ecb5b440f368>
> 2013-10-09 21:52:59 <gfid:8fe46824-a9f1-4095-b204-e9e137ae8643>
> ...
> Count : 1023
>
> We tried to clean all the dangling links but they are still coming back the split-brain is not resolved.
>
> It maybe the root cause of the problem, how do we resolve those split brain?
>
> -----Original Message-----
> From: Alexandre Fournier
> Sent: 21 novembre 2013 15:20
> To: 'Lalatendu Mohanty'; Pranith Kumar Karampuri
> Cc: gluster-users at gluster.org; gluster-devel at nongnu.org
> Subject: RE: [Gluster-devel] [Gluster-users] Self Heal and dangling 
> symlinks
>
> Ok here is the information :
>
> Stat :
>    File: `/aa/aa/aa/aa/aa/aa/aa
>    Size: 14364           Blocks: 32         IO Block: 4096   regular file
> Device: 822h/2082d      Inode: 3155137861  Links: 2
> Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2013-11-21 13:14:58.527765935 +0000
> Modify: 2013-11-13 13:19:13.736226050 +0000
> Change: 2013-11-13 13:19:13.736226050 +0000
>
>    File: `/aa/aa/aa/aa/aa/aa/aa
>    Size: 14364           Blocks: 32         IO Block: 4096   regular file
> Device: 822h/2082d      Inode: 3076494286  Links: 2
> Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2013-11-21 13:14:58.527674754 +0000
> Modify: 2013-11-13 13:19:13.736442464 +0000
> Change: 2013-11-13 13:19:13.736442464 +0000
>   Birth: -
>
> Attributes :
>
> # file: aa/aa/aa/aa/aa/aa/aa
> trusted.afr.gv0-client-0=0x000000000000000000000000
> trusted.afr.gv0-client-1=0x000000000000000000000000
> trusted.gfid=0xb5b8c3ec9dd24609b56476651113d3fa
>
>
> # file: aa/aa/aa/aa/aa/aa/aa
> trusted.afr.gv0-client-0=0x000000000000000000000000
> trusted.afr.gv0-client-1=0x000000000000000000000000
> trusted.gfid=0xb5b8c3ec9dd24609b56476651113d3fa
>
>
> -----Original Message-----
> From: Lalatendu Mohanty [mailto:lmohanty at redhat.com]
> Sent: 21 novembre 2013 14:05
> To: Alexandre Fournier; Pranith Kumar Karampuri
> Cc: gluster-users at gluster.org; gluster-devel at nongnu.org
> Subject: Re: [Gluster-devel] [Gluster-users] Self Heal and dangling 
> symlinks
>
> On 11/21/2013 07:54 PM, Alexandre Fournier wrote:
>> They are both regular file on the node and the replicas and they have 
>> the same GFID.  I ran also the gluster volume heal gv0 split-brain 
>> command and the file is not in the list.  We have an entire directory 
>> though (1023 entry on a node)
>>
>> However, the file was already on the brick before uploading it and I noticed that that the write did not work since the last modification date does not match the upload time.
>>
>> Through a web service, we offer to upload files on the gluster mount.   This web service  upload the file on a temporary folder and then MOVE the file on the gluster mount.
>>
>> Could the move operation give strange behavior like this?
> Alexandre,
>
> No, it should not. Please let us know the answers of the questions Pranith and I asked, so we can understand the root cause of your problem.
>
>> Alexandre Fournier
>> Tools Programmer
>> Ubisoft Production Services
>>
>>
>> -----Original Message-----
>> From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
>> Sent: 21 novembre 2013 00:47
>> To: Lalatendu Mohanty
>> Cc: Alexandre Fournier; gluster-users at gluster.org; 
>> gluster-devel at nongnu.org
>> Subject: Re: [Gluster-devel] [Gluster-users] Self Heal and dangling 
>> symlinks
>>
>> Alexandre,
>>      Seems like there is an entry split-brain (same file/dir name but on one brick it is a file and on the other it is a directory) according to the following log:
>>> [2013-11-18 18:18:43.052446] W
>>> [afr-common.c:1411:afr_conflicting_iattrs]
>>> 0-gv0-replicate-0: /aa/aa/aa/aa: filetype differs on subvolumes (0,
>>> 1)
>> Could you get us the output of "stat <brick-dir-path>/aa/aa/aa/aa/aa" and "getfattr -d -m. -e hex <brick-dir-path>/aa/aa/aa/aa/aa" on both the bricks.
>>
>> Pranith
>> ----- Original Message -----
>>> From: "Lalatendu Mohanty" <lmohanty at redhat.com>
>>> To: "Alexandre Fournier" <alexandre.fournier at ubisoft.com>, 
>>> gluster-users at gluster.org, gluster-devel at nongnu.org
>>> Sent: Thursday, November 21, 2013 1:28:01 AM
>>> Subject: Re: [Gluster-devel] [Gluster-users] Self Heal and dangling 
>>> symlinks
>>>
>>> On 11/19/2013 10:49 PM, Alexandre Fournier wrote:
>>>
>>>
>>>
>>>
>>>
>>> Hello,
>>>
>>>
>>>
>>> We are experiencing strange behavior when writing file on the 
>>> Gluster mount point. On some occasion, when writing to the Gluster 
>>> Mount we have an Open Stream error. We’ve looked the gluster logs 
>>> and found the following faulty entries :
>>>
>>>
>>>
>>> [From /var/log/glusterfs/mnt-gv0.log]
>>>
>>>
>>>
>>> [2013-11-18 18:18:43.052446] W
>>> [afr-common.c:1411:afr_conflicting_iattrs]
>>> 0-gv0-replicate-0: /aa/aa/aa/aa: filetype differs on subvolumes (0,
>>> 1)
>>>
>>> [2013-11-18 18:18:43.052468] E
>>> [afr-self-heal-common.c:1409:afr_sh_common_lookup_cbk] 0-gv0-replicate-0:
>>> Conflicting entries for /aa/aa/aa/aa
>>>
>>> [2013-11-18 18:18:43.052757] E
>>> [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk]
>>> 0-gv0-replicate-0: background meta-data data entry missing-entry 
>>> gfid self-heal
>>>
>>> failed on /aa/aa/aa/aa/aa
>>>
>>> [2013-11-18 18:18:43.052780] W [fuse-bridge.c:292:fuse_entry_cbk]
>>> 0-glusterfs-fuse: 439382194: LOOKUP() /aa/aa/aa/aa/aa => -1 
>>> (Input/output
>>> error)
>>>
>>>
>>>
>>> We’ve looked at the log file etc-glusterfs-glusterd.vol.log but we 
>>> found nothing related to this problem. Then, we’ve looked at the log 
>>> From /var/log/glusterfs/bricks/mnt-data.log and we found 70 gig of 
>>> logs of the same type :
>>>
>>>
>>>
>>> [2013-11-19 17:13:32.269757] W
>>> [posix-handle.c:538:posix_handle_soft]
>>> 0-gv0-posix: symlink
>>> ../../ab/fe/abfeb61c-501d-4417-b8fb-0accdd57146f/cf -> 
>>> /mnt/data/.glusterfs/ab/fe/abfeb61c-501d-4417-b8fb-0accdd57146f/cf
>>> failed (No such file or directory)
>>>
>>> [2013-11-19 17:13:32.269978] W
>>> [posix-handle.c:538:posix_handle_soft]
>>> 0-gv0-posix: symlink
>>> ../../c7/8b/c78be78f-cc95-47b2-a27f-4217f1759b67/d2 ->
>>> /mnt/data/.glusterfs/c7/8b/c78be78f-cc95-47b2-a27f-4217f1759b67/d2
>>> failed (No such file or directory)
>>>
>>> [2013-11-19 17:13:32.270190] W
>>> [posix-handle.c:538:posix_handle_soft]
>>> 0-gv0-posix: symlink
>>> ../../5a/8f/5a8fa43c-4ccc-4d88-9122-a96bc8ffaebc/f2 ->
>>> /mnt/data/.glusterfs/5a/8f/5a8fa43c-4ccc-4d88-9122-a96bc8ffaebc/f2
>>> failed (No such file or directory)
>>>
>>>
>>>
>>> This looks like a bug, unless there is something wrong with the 
>>> set-up. I have copied gluster-devel in this thread as I think they might help.
>>>
>>> Just curious, is all your gluster nodes have equal time (i.e. ntp synced).
>>>
>>>
>>>
>>>
>>>
>>>
>>> And it does not stop logging. It seems that the self heal is not 
>>> working properly when there are broking symlinks in the gluster. It 
>>> is worth saying also that this log is only produce on a single node 
>>> but the write fail on several node though. Also, we try to clean the 
>>> symlinks manually but it always come back.
>>>
>>>
>>>
>>> Is it possible to recover from broken symlinks?
>>>
>>>
>>>
>>> Configuration :
>>>
>>> Gluster Version : 3.3.2
>>>
>>> Cluster setup : 4 X 2
>>>
>>> OS : Ubuntu
>>>
>>> On Fuse
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Alexandre
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list Gluster-users at gluster.org 
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at nongnu.org
>>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


More information about the Gluster-devel mailing list