[Gluster-users] Files present on the backend but have become invisible from clients

Burnash, James jburnash at knight.com
Thu May 19 19:42:16 UTC 2011


"Good ones" in what way? 

Permissions on the backend storage are here:

http://pastebin.com/EiMvbgdh

-----Original Message-----
From: Mohit Anchlia [mailto:mohitanchlia at gmail.com] 
Sent: Thursday, May 19, 2011 3:09 PM
To: Burnash, James
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Files present on the backend but have become invisible from clients

It looks like a bug. You are missing xattrs. Can you confirm if all dirs that have "0sAAAAAAAAAAAAAAAA" in your pastebin are good ones?

On Thu, May 19, 2011 at 11:51 AM, Burnash, James <jburnash at knight.com> wrote:
> Hi Mohit.
>
> Answers inline below:
>
> -----Original Message-----
> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com]
> Sent: Thursday, May 19, 2011 1:17 PM
> To: Burnash, James
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Files present on the backend but have 
> become invisible from clients
>
> Can you post the output of  getfattr -dm - <file|dir> for all parent dirs.
>        http://pastebin.com/EVfRsSrD
>
>  and for one of the files from the server?
>
> #  getfattr -dm - 
> /export/read-only/g01/online_archive/2011/01/05/20110105.SN.grep.gz
> getfattr: Removing leading '/' from absolute path names # file: 
> export/read-only/g01/online_archive/2011/01/05/20110105.SN.grep.gz
> trusted.afr.pfs-ro1-client-0=0sAAAAAAAAAAAAAAAA
> trusted.afr.pfs-ro1-client-1=0sAAAAAAAAAAAAAAAA
> trusted.gfid=0sjyq/BEwuRhaVbF7qdo0lqA==
>
> Thank you sir!
>
> James
>
>
> On Thu, May 19, 2011 at 8:15 AM, Burnash, James <jburnash at knight.com> wrote:
>> Hello folks. A new conundrum to make sure that my life with GlusterFS 
>> doesn't become boring :-)
>>
>> Configuration at end of this message:
>>
>> On client - directory appears to be empty:
>> # ls -l /pfs2/online_archive/2011/01
>> total 0
>>
>> fgrep -C 2 inode /var/log/glusterfs/pfs2.log | tail -10
>> [2011-05-18 14:40:11.665045] E
>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of 
>> '/' (possib le split-brain). Please fix the file on all backend 
>> volumes
>> [2011-05-18 14:43:47.810045] E [rpc-clnt.c:199:call_bail]
>> 0-pfs-ro1-client-1: bailing out frame type(GlusterFS 3.1)
>> op(INODELK(29)) xid = 0x130824x sent = 2011-0
>> 5-18 14:13:45.978987. timeout = 1800
>> [2011-05-18 14:53:12.311323] E [afr-common.c:110:afr_set_split_brain]
>> 0-pfs-ro1-replicate-0: invalid argument: inode
>> [2011-05-18 15:00:32.240373] E
>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of 
>> '/' (possib le split-brain). Please fix the file on all backend 
>> volumes
>> [2011-05-18 15:10:12.282848] E
>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of 
>> '/' (possib le split-brain). Please fix the file on all backend 
>> volumes
>> --
>> [2011-05-19 10:10:25.967246] E
>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of 
>> '/' (possib le split-brain). Please fix the file on all backend 
>> volumes
>> [2011-05-19 10:20:18.551953] E
>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of 
>> '/' (possib le split-brain). Please fix the file on all backend 
>> volumes
>> [2011-05-19 10:29:34.834256] E [afr-common.c:110:afr_set_split_brain]
>> 0-pfs-ro1-replicate-0: invalid argument: inode
>> [2011-05-19 10:30:06.898152] E
>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of 
>> '/' (possib le split-brain). Please fix the file on all backend 
>> volumes
>> [2011-05-19 10:32:05.258799] E
>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of 
>> '/' (possib le split-brain). Please fix the file on all backend 
>> volumes
>>
>>
>> On server - directory is populated:
>> loop_check ' ls -l /export/read-only/g*/online_archive/2011/01'
>> jc1letgfs{14,15,17,18} | less
>> jc1letgfs14
>> /export/read-only/g01/online_archive/2011/01:
>> total 80
>> drwxrwxrwt 3    403 1009 4096 May  4 10:35 03 drwxrwxrwt 3 107421 
>> 1009
>> 4096 May  7 12:18 04 drwxrwxrwt 3 107421 1009 4096 May  4 10:35 05 
>> drwxrwxrwt 3 107421 1009 4096 May  4 10:36 06 drwxrwxrwt 3 107421 
>> 1009
>> 4096 May  4 10:36 07 drwxrwxrwt 3 107421 1009 4096 May  4 10:41 10 
>> drwxrwxrwt 3 107421 1009 4096 May  4 10:37 11 drwxrwxrwt 3 107421 
>> 1009
>> 4096 May  4 10:43 12 drwxrwxrwt 3 107421 1009 4096 May  4 10:43 13 
>> drwxrwxrwt 3 107421 1009 4096 May  4 10:44 14 drwxrwxrwt 3 107421 
>> 1009
>> 4096 May  4 10:46 18 drwxrwxrwt 3 107421 1009 4096 Apr 14 14:11 19 
>> drwxrwxrwt 3 107421 1009 4096 May  4 10:43 20 drwxrwxrwt 3 107421 
>> 1009
>> 4096 May  4 10:49 21 drwxrwxrwt 3 107421 1009 4096 May  4 10:45 24 
>> drwxrwxrwt 3 107421 1009 4096 May  4 10:47 25 drwxrwxrwt 3 107421 
>> 1009
>> 4096 May  4 10:52 26 drwxrwxrwt 3 107421 1009 4096 May  4 10:49 27 
>> drwxrwxrwt 3 107421 1009 4096 May  4 10:50 28 drwxrwxrwt 3 107421 
>> 1009
>> 4096 May  4 10:56 31
>>
>> (and shows on every brick the same)
>>
>> And from the server logs:
>> root at jc1letgfs17:/var/log/glusterfs# fgrep '2011-05-19 10:39:30'
>> bricks/export-read-only-g*.log
>> [2011-05-19 10:39:30.306661] E [posix.c:438:posix_lookup]
>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data 
>> available
>> [2011-05-19 10:39:30.307754] E [posix.c:438:posix_lookup]
>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data 
>> available
>> [2011-05-19 10:39:30.308230] E [posix.c:438:posix_lookup]
>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data 
>> available
>> [2011-05-19 10:39:30.322342] E [posix.c:438:posix_lookup]
>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data 
>> available
>> [2011-05-19 10:39:30.421298] E [posix.c:438:posix_lookup]
>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data 
>> available
>>
>> The only two things that jump out so far are:
>>  the permissions on the directories under /export/read-only/g01/online_archive/2011/01 are 7777, whereas on the directories under /export/read-only/g01/online_archive/2010/01 are just 755.
>> The lstat "no data available errors" only see to appear on the problem directories.
>>
>>  Any hints or suggestions would be greatly appreciated. Thanks, James
>>
>>
>> Config:
>> All on Gluster 3.1.3
>> Servers:
>> 4 CentOS 5.5 (ProLiant DL370 G6 servers, Intel Xeon 3200 MHz), Each
>> with:
>> Single P812 Smart Array Controller,
>> Single MDS600 with 70 2TB SATA drives configured as RAID 50
>> 48 MB RAM
>>
>> Clients:
>> 185 CentOS 5.2 (mostly DL360 G6).
>> /pfs2 is the mount point for a Duplicated-Replicate volume of 4 servers.
>>
>> Volume Name: pfs-ro1
>> Type: Distributed-Replicate
>> Status: Started
>> Number of Bricks: 20 x 2 = 40
>> Transport-type: tcp
>> Bricks:
>> Brick1: jc1letgfs17-pfs1:/export/read-only/g01
>> Brick2: jc1letgfs18-pfs1:/export/read-only/g01
>> Brick3: jc1letgfs17-pfs1:/export/read-only/g02
>> Brick4: jc1letgfs18-pfs1:/export/read-only/g02
>> Brick5: jc1letgfs17-pfs1:/export/read-only/g03
>> Brick6: jc1letgfs18-pfs1:/export/read-only/g03
>> Brick7: jc1letgfs17-pfs1:/export/read-only/g04
>> Brick8: jc1letgfs18-pfs1:/export/read-only/g04
>> Brick9: jc1letgfs17-pfs1:/export/read-only/g05
>> Brick10: jc1letgfs18-pfs1:/export/read-only/g05
>> Brick11: jc1letgfs17-pfs1:/export/read-only/g06
>> Brick12: jc1letgfs18-pfs1:/export/read-only/g06
>> Brick13: jc1letgfs17-pfs1:/export/read-only/g07
>> Brick14: jc1letgfs18-pfs1:/export/read-only/g07
>> Brick15: jc1letgfs17-pfs1:/export/read-only/g08
>> Brick16: jc1letgfs18-pfs1:/export/read-only/g08
>> Brick17: jc1letgfs17-pfs1:/export/read-only/g09
>> Brick18: jc1letgfs18-pfs1:/export/read-only/g09
>> Brick19: jc1letgfs17-pfs1:/export/read-only/g10
>> Brick20: jc1letgfs18-pfs1:/export/read-only/g10
>> Brick21: jc1letgfs14-pfs1:/export/read-only/g01
>> Brick22: jc1letgfs15-pfs1:/export/read-only/g01
>> Brick23: jc1letgfs14-pfs1:/export/read-only/g02
>> Brick24: jc1letgfs15-pfs1:/export/read-only/g02
>> Brick25: jc1letgfs14-pfs1:/export/read-only/g03
>> Brick26: jc1letgfs15-pfs1:/export/read-only/g03
>> Brick27: jc1letgfs14-pfs1:/export/read-only/g04
>> Brick28: jc1letgfs15-pfs1:/export/read-only/g04
>> Brick29: jc1letgfs14-pfs1:/export/read-only/g05
>> Brick30: jc1letgfs15-pfs1:/export/read-only/g05
>> Brick11: jc1letgfs17-pfs1:/export/read-only/g06
>> Brick12: jc1letgfs18-pfs1:/export/read-only/g06
>> Brick13: jc1letgfs17-pfs1:/export/read-only/g07
>> Brick14: jc1letgfs18-pfs1:/export/read-only/g07
>> Brick15: jc1letgfs17-pfs1:/export/read-only/g08
>> Brick16: jc1letgfs18-pfs1:/export/read-only/g08
>> Brick17: jc1letgfs17-pfs1:/export/read-only/g09
>> Brick18: jc1letgfs18-pfs1:/export/read-only/g09
>> Brick19: jc1letgfs17-pfs1:/export/read-only/g10
>> Brick20: jc1letgfs18-pfs1:/export/read-only/g10
>> Brick21: jc1letgfs14-pfs1:/export/read-only/g01
>> Brick22: jc1letgfs15-pfs1:/export/read-only/g01
>> Brick23: jc1letgfs14-pfs1:/export/read-only/g02
>> Brick24: jc1letgfs15-pfs1:/export/read-only/g02
>> Brick25: jc1letgfs14-pfs1:/export/read-only/g03
>> Brick26: jc1letgfs15-pfs1:/export/read-only/g03
>> Brick27: jc1letgfs14-pfs1:/export/read-only/g04
>> Brick28: jc1letgfs15-pfs1:/export/read-only/g04
>> Brick29: jc1letgfs14-pfs1:/export/read-only/g05
>> Brick30: jc1letgfs15-pfs1:/export/read-only/g05
>> Brick31: jc1letgfs14-pfs1:/export/read-only/g06
>> Brick32: jc1letgfs15-pfs1:/export/read-only/g06
>> Brick33: jc1letgfs14-pfs1:/export/read-only/g07
>> Brick34: jc1letgfs15-pfs1:/export/read-only/g07
>> Brick35: jc1letgfs14-pfs1:/export/read-only/g08
>> Brick36: jc1letgfs15-pfs1:/export/read-only/g08
>> Brick37: jc1letgfs14-pfs1:/export/read-only/g09
>> Brick38: jc1letgfs15-pfs1:/export/read-only/g09
>> Brick39: jc1letgfs14-pfs1:/export/read-only/g10
>> Brick40: jc1letgfs15-pfs1:/export/read-only/g10
>> Options Reconfigured:
>> diagnostics.brick-log-level: ERROR
>> cluster.metadata-change-log: on
>> diagnostics.client-log-level: ERROR
>> performance.stat-prefetch: on
>> performance.cache-size: 2GB
>> network.ping-timeout: 10
>>
>>
>> DISCLAIMER:
>> This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
>> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group 
>> may, at its discretion, monitor and review the content of all e-mail 
>> communications. http://www.knight.com 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>



More information about the Gluster-users mailing list