[Gluster-users] Files present on the backend but have become invisible from clients
Burnash, James
jburnash at knight.com
Thu May 19 19:47:00 UTC 2011
>From the client, I can't see files in any directories under the path of /pfs2/online_archive/2011/*.
root at jc1lnxsamm100:/pfs2/test# ls -l /pfs2/online_archive/2011
total 212
drwxr-xr-x 22 statarb arb 4096 Jan 31 09:18 01
drwxr-xr-x 21 dataops arb 77824 Feb 28 09:18 02
drwxr-xr-x 25 dataops arb 4096 Mar 31 18:15 03
drwxr-xr-x 22 dataops arb 77824 May 4 11:42 04
drwxr-xr-x 15 dataops arb 4096 May 18 21:10 05
drwxr-xr-x 2 dataops arb 114 Dec 30 10:10 06
drwxr-xr-x 2 dataops arb 114 Dec 30 10:10 07
drwxr-xr-x 2 dataops arb 114 Dec 30 10:10 08
drwxr-xr-x 2 dataops arb 114 Dec 30 10:10 09
drwxr-xr-x 2 dataops arb 114 Dec 30 10:10 10
drwxr-xr-x 2 dataops arb 114 Dec 30 10:10 11
drwxr-xr-x 2 dataops arb 114 Dec 30 10:10 12
root at jc1lnxsamm100:/pfs2/test# ls -l /pfs2/online_archive/2011/*
/pfs2/online_archive/2011/01:
total 0
/pfs2/online_archive/2011/02:
total 0
/pfs2/online_archive/2011/03:
total 0
/pfs2/online_archive/2011/04:
total 0
/pfs2/online_archive/2011/05:
total 0
/pfs2/online_archive/2011/06:
total 0
/pfs2/online_archive/2011/07:
total 0
/pfs2/online_archive/2011/08:
total 0
/pfs2/online_archive/2011/09:
total 0
/pfs2/online_archive/2011/10:
total 0
/pfs2/online_archive/2011/11:
total 0
/pfs2/online_archive/2011/12:
total 0
-----Original Message-----
From: Mohit Anchlia [mailto:mohitanchlia at gmail.com]
Sent: Thursday, May 19, 2011 3:44 PM
To: Burnash, James
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Files present on the backend but have become invisible from clients
As in do you see all the files in those dirs unlike others?
On Thu, May 19, 2011 at 12:42 PM, Burnash, James <jburnash at knight.com> wrote:
> "Good ones" in what way?
>
> Permissions on the backend storage are here:
>
> http://pastebin.com/EiMvbgdh
>
> -----Original Message-----
> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com]
> Sent: Thursday, May 19, 2011 3:09 PM
> To: Burnash, James
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Files present on the backend but have
> become invisible from clients
>
> It looks like a bug. You are missing xattrs. Can you confirm if all dirs that have "0sAAAAAAAAAAAAAAAA" in your pastebin are good ones?
>
> On Thu, May 19, 2011 at 11:51 AM, Burnash, James <jburnash at knight.com> wrote:
>> Hi Mohit.
>>
>> Answers inline below:
>>
>> -----Original Message-----
>> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com]
>> Sent: Thursday, May 19, 2011 1:17 PM
>> To: Burnash, James
>> Cc: gluster-users at gluster.org
>> Subject: Re: [Gluster-users] Files present on the backend but have
>> become invisible from clients
>>
>> Can you post the output of getfattr -dm - <file|dir> for all parent dirs.
>> http://pastebin.com/EVfRsSrD
>>
>> and for one of the files from the server?
>>
>> # getfattr -dm -
>> /export/read-only/g01/online_archive/2011/01/05/20110105.SN.grep.gz
>> getfattr: Removing leading '/' from absolute path names # file:
>> export/read-only/g01/online_archive/2011/01/05/20110105.SN.grep.gz
>> trusted.afr.pfs-ro1-client-0=0sAAAAAAAAAAAAAAAA
>> trusted.afr.pfs-ro1-client-1=0sAAAAAAAAAAAAAAAA
>> trusted.gfid=0sjyq/BEwuRhaVbF7qdo0lqA==
>>
>> Thank you sir!
>>
>> James
>>
>>
>> On Thu, May 19, 2011 at 8:15 AM, Burnash, James <jburnash at knight.com> wrote:
>>> Hello folks. A new conundrum to make sure that my life with
>>> GlusterFS doesn't become boring :-)
>>>
>>> Configuration at end of this message:
>>>
>>> On client - directory appears to be empty:
>>> # ls -l /pfs2/online_archive/2011/01 total 0
>>>
>>> fgrep -C 2 inode /var/log/glusterfs/pfs2.log | tail -10
>>> [2011-05-18 14:40:11.665045] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-18 14:43:47.810045] E [rpc-clnt.c:199:call_bail]
>>> 0-pfs-ro1-client-1: bailing out frame type(GlusterFS 3.1)
>>> op(INODELK(29)) xid = 0x130824x sent = 2011-0
>>> 5-18 14:13:45.978987. timeout = 1800
>>> [2011-05-18 14:53:12.311323] E
>>> [afr-common.c:110:afr_set_split_brain]
>>> 0-pfs-ro1-replicate-0: invalid argument: inode
>>> [2011-05-18 15:00:32.240373] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-18 15:10:12.282848] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> --
>>> [2011-05-19 10:10:25.967246] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-19 10:20:18.551953] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-19 10:29:34.834256] E
>>> [afr-common.c:110:afr_set_split_brain]
>>> 0-pfs-ro1-replicate-0: invalid argument: inode
>>> [2011-05-19 10:30:06.898152] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-19 10:32:05.258799] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>>
>>>
>>> On server - directory is populated:
>>> loop_check ' ls -l /export/read-only/g*/online_archive/2011/01'
>>> jc1letgfs{14,15,17,18} | less
>>> jc1letgfs14
>>> /export/read-only/g01/online_archive/2011/01:
>>> total 80
>>> drwxrwxrwt 3 403 1009 4096 May 4 10:35 03 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May 7 12:18 04 drwxrwxrwt 3 107421 1009 4096 May 4 10:35 05
>>> drwxrwxrwt 3 107421 1009 4096 May 4 10:36 06 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May 4 10:36 07 drwxrwxrwt 3 107421 1009 4096 May 4 10:41 10
>>> drwxrwxrwt 3 107421 1009 4096 May 4 10:37 11 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May 4 10:43 12 drwxrwxrwt 3 107421 1009 4096 May 4 10:43 13
>>> drwxrwxrwt 3 107421 1009 4096 May 4 10:44 14 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May 4 10:46 18 drwxrwxrwt 3 107421 1009 4096 Apr 14 14:11 19
>>> drwxrwxrwt 3 107421 1009 4096 May 4 10:43 20 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May 4 10:49 21 drwxrwxrwt 3 107421 1009 4096 May 4 10:45 24
>>> drwxrwxrwt 3 107421 1009 4096 May 4 10:47 25 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May 4 10:52 26 drwxrwxrwt 3 107421 1009 4096 May 4 10:49 27
>>> drwxrwxrwt 3 107421 1009 4096 May 4 10:50 28 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May 4 10:56 31
>>>
>>> (and shows on every brick the same)
>>>
>>> And from the server logs:
>>> root at jc1letgfs17:/var/log/glusterfs# fgrep '2011-05-19 10:39:30'
>>> bricks/export-read-only-g*.log
>>> [2011-05-19 10:39:30.306661] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>> [2011-05-19 10:39:30.307754] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>> [2011-05-19 10:39:30.308230] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>> [2011-05-19 10:39:30.322342] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>> [2011-05-19 10:39:30.421298] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>>
>>> The only two things that jump out so far are:
>>> the permissions on the directories under /export/read-only/g01/online_archive/2011/01 are 7777, whereas on the directories under /export/read-only/g01/online_archive/2010/01 are just 755.
>>> The lstat "no data available errors" only see to appear on the problem directories.
>>>
>>> Any hints or suggestions would be greatly appreciated. Thanks,
>>> James
>>>
>>>
>>> Config:
>>> All on Gluster 3.1.3
>>> Servers:
>>> 4 CentOS 5.5 (ProLiant DL370 G6 servers, Intel Xeon 3200 MHz), Each
>>> with:
>>> Single P812 Smart Array Controller,
>>> Single MDS600 with 70 2TB SATA drives configured as RAID 50
>>> 48 MB RAM
>>>
>>> Clients:
>>> 185 CentOS 5.2 (mostly DL360 G6).
>>> /pfs2 is the mount point for a Duplicated-Replicate volume of 4 servers.
>>>
>>> Volume Name: pfs-ro1
>>> Type: Distributed-Replicate
>>> Status: Started
>>> Number of Bricks: 20 x 2 = 40
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: jc1letgfs17-pfs1:/export/read-only/g01
>>> Brick2: jc1letgfs18-pfs1:/export/read-only/g01
>>> Brick3: jc1letgfs17-pfs1:/export/read-only/g02
>>> Brick4: jc1letgfs18-pfs1:/export/read-only/g02
>>> Brick5: jc1letgfs17-pfs1:/export/read-only/g03
>>> Brick6: jc1letgfs18-pfs1:/export/read-only/g03
>>> Brick7: jc1letgfs17-pfs1:/export/read-only/g04
>>> Brick8: jc1letgfs18-pfs1:/export/read-only/g04
>>> Brick9: jc1letgfs17-pfs1:/export/read-only/g05
>>> Brick10: jc1letgfs18-pfs1:/export/read-only/g05
>>> Brick11: jc1letgfs17-pfs1:/export/read-only/g06
>>> Brick12: jc1letgfs18-pfs1:/export/read-only/g06
>>> Brick13: jc1letgfs17-pfs1:/export/read-only/g07
>>> Brick14: jc1letgfs18-pfs1:/export/read-only/g07
>>> Brick15: jc1letgfs17-pfs1:/export/read-only/g08
>>> Brick16: jc1letgfs18-pfs1:/export/read-only/g08
>>> Brick17: jc1letgfs17-pfs1:/export/read-only/g09
>>> Brick18: jc1letgfs18-pfs1:/export/read-only/g09
>>> Brick19: jc1letgfs17-pfs1:/export/read-only/g10
>>> Brick20: jc1letgfs18-pfs1:/export/read-only/g10
>>> Brick21: jc1letgfs14-pfs1:/export/read-only/g01
>>> Brick22: jc1letgfs15-pfs1:/export/read-only/g01
>>> Brick23: jc1letgfs14-pfs1:/export/read-only/g02
>>> Brick24: jc1letgfs15-pfs1:/export/read-only/g02
>>> Brick25: jc1letgfs14-pfs1:/export/read-only/g03
>>> Brick26: jc1letgfs15-pfs1:/export/read-only/g03
>>> Brick27: jc1letgfs14-pfs1:/export/read-only/g04
>>> Brick28: jc1letgfs15-pfs1:/export/read-only/g04
>>> Brick29: jc1letgfs14-pfs1:/export/read-only/g05
>>> Brick30: jc1letgfs15-pfs1:/export/read-only/g05
>>> Brick11: jc1letgfs17-pfs1:/export/read-only/g06
>>> Brick12: jc1letgfs18-pfs1:/export/read-only/g06
>>> Brick13: jc1letgfs17-pfs1:/export/read-only/g07
>>> Brick14: jc1letgfs18-pfs1:/export/read-only/g07
>>> Brick15: jc1letgfs17-pfs1:/export/read-only/g08
>>> Brick16: jc1letgfs18-pfs1:/export/read-only/g08
>>> Brick17: jc1letgfs17-pfs1:/export/read-only/g09
>>> Brick18: jc1letgfs18-pfs1:/export/read-only/g09
>>> Brick19: jc1letgfs17-pfs1:/export/read-only/g10
>>> Brick20: jc1letgfs18-pfs1:/export/read-only/g10
>>> Brick21: jc1letgfs14-pfs1:/export/read-only/g01
>>> Brick22: jc1letgfs15-pfs1:/export/read-only/g01
>>> Brick23: jc1letgfs14-pfs1:/export/read-only/g02
>>> Brick24: jc1letgfs15-pfs1:/export/read-only/g02
>>> Brick25: jc1letgfs14-pfs1:/export/read-only/g03
>>> Brick26: jc1letgfs15-pfs1:/export/read-only/g03
>>> Brick27: jc1letgfs14-pfs1:/export/read-only/g04
>>> Brick28: jc1letgfs15-pfs1:/export/read-only/g04
>>> Brick29: jc1letgfs14-pfs1:/export/read-only/g05
>>> Brick30: jc1letgfs15-pfs1:/export/read-only/g05
>>> Brick31: jc1letgfs14-pfs1:/export/read-only/g06
>>> Brick32: jc1letgfs15-pfs1:/export/read-only/g06
>>> Brick33: jc1letgfs14-pfs1:/export/read-only/g07
>>> Brick34: jc1letgfs15-pfs1:/export/read-only/g07
>>> Brick35: jc1letgfs14-pfs1:/export/read-only/g08
>>> Brick36: jc1letgfs15-pfs1:/export/read-only/g08
>>> Brick37: jc1letgfs14-pfs1:/export/read-only/g09
>>> Brick38: jc1letgfs15-pfs1:/export/read-only/g09
>>> Brick39: jc1letgfs14-pfs1:/export/read-only/g10
>>> Brick40: jc1letgfs15-pfs1:/export/read-only/g10
>>> Options Reconfigured:
>>> diagnostics.brick-log-level: ERROR
>>> cluster.metadata-change-log: on
>>> diagnostics.client-log-level: ERROR
>>> performance.stat-prefetch: on
>>> performance.cache-size: 2GB
>>> network.ping-timeout: 10
>>>
>>>
>>> DISCLAIMER:
>>> This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
>>> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group
>>> may, at its discretion, monitor and review the content of all e-mail
>>> communications. http://www.knight.com
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>>
>
More information about the Gluster-users
mailing list