[Gluster-users] Hundreds of duplicate files
tbenzvi at 3vgeomatics.com
tbenzvi at 3vgeomatics.com
Mon Dec 22 03:49:52 UTC 2014
Actually we are using XFS for the bricks. Still haven't made any progress on this issue, unfortunately..
--------- Original Message --------- Subject: Re: [Gluster-users] Hundreds of duplicate files
From: "Anders Blomdell" <anders.blomdell at control.lth.se>
Date: 12/21/14 7:42 pm
To: tbenzvi at 3vgeomatics.com, gluster-users at gluster.org
On 21 December 2014 06:37:44 CET, tbenzvi at 3vgeomatics.com wrote:
>Hi Joe,
>
>Thanks for the reply. That worked; I probably forgot to do this as root
>last time. Yet, the files still show up twice in a directory listing on
>the mounted volume. And it seems to be random whether reading the file
>will succeed or not. I've tried with several files and it sometimes
>works and sometimes fails; I assume this depends on whether it locates
>the actual file on the brick or the link file. Let me know if you have
>any idea what's going on.
Does the brick filesystem happen to be ext4? I havs hed the similar problem with 3.6.x and
ext4 (64 bit offset problem).
>
>Output of the command:
>
>$ getfattr -m . -d -e hex
>/data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras
>getfattr: Removing leading '/' from absolute path names
># file:
>data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras
>system.posix_acl_access=0x0200000001000600ffffffff04000600ffffffff10000600ffffffff20000400ffffffff
>trusted.SGI_ACL_FILE=0x0000000400000001ffffffff0006000000000004ffffffff0006000000000010ffffffff0006000000000020ffffffff00040000
>trusted.gfid=0x52c2aed77d09412d8bfd7ca70e87b196
>trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3200
>
>
>Cheers,
>Tom
>
>--------- Original Message --------- Subject: Re: [Gluster-users]
>Hundreds of duplicate files
>From: "Joe Julian" <joe at julianfamily.org>
>Date: 12/20/14 8:53 pm
>To: gluster-users at gluster.org
>
>Try 'getfattr -m . -d -e hex' (dot instead of dash) and, of course, do
>that as root.
>
> On 12/20/2014 06:02 PM, tbenzvi at 3vgeomatics.com wrote:
> Hi everyone,
>
>We have a distributed Gluster volume on five bricks over two servers
>(first server running gluster 3.4.2, second server running gluster
>3.5.1, both running Fedora 20)
>Starting last week, doing a file listing on the mounted volume shows
>many files with the same name appearing twice (and they are listed with
>the same inode). Doing a search for these files, I have found 290,000
>of them!!
>
>If I do a listing of these files on the bricks themselves, it looks
>like most are link files (du will show the file on the first server as
>0 bytes, and the sticky bit set). The file is fine on the second
>server. Unfortunately, running "getfattr -m - -e hex -d" on the file
>shows NO gluster-related attributes and I believe this is why both
>files appear in the listing. The files cannot be read by any programs
>as it is trying to read the link file. I assume the metadata became
>corrupted. This is a production server so we really need to know:
>
>1. How did this happen, and how can we prevent it going forward? There
>was a server crash a week ago and I believe that was the cause.
>2. How can we heal the Gluster volume/bricks and link files. If there
>is some straightforward way of restoring the link file pointer I can
>write a script to do it, obviously doing this manually will be
>impossible.
>
>Thanks very much for any and all help - much appreciated!
>
>Regards,
>Tom
>
>
>On Wed, Dec 17, 2014 at 4:07 AM, <tbenzvi at 3vgeomatics.com> wrote:
>> Hi everyone, we have noticed some extremely odd behaviour with our
> > distributed Gluster volume where duplicate files (same name, same or
>> different content) are being created and stored on multiple bricks.
>The only
>> consistent clue is that one of the duplicate files has the sticky bit
>set. I
>> am hoping someone will be able to shed some light on why this is
>happening
>> and how we can restore the volume as there appear to be hundreds of
>such
> > files. I will try to provide as much pertinent information as I can.
> >
>> We have a 130TB Gluster volume consisting of two 20TB bricks on
>server1, and
> > three 40TB bricks on a server2 which were added at a later date (and
>> rebalancing was done). The volume is mounted on server1, and accessed
>only
>> through this server but by many users. Both servers went down due to
>power
>> loss several days ago after which this problem was first noticed. We
>ran a
> > rebalance command on the volumes, this has not fixed the problem.
> >
> >
> > Gluster volume info:
> > Volume Name: safari
> > Type: Distribute
> > Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda
> > Status: Started
> > Number of Bricks: 5
> > Transport-type: tcp
> > Bricks:
> > Brick1: server1:/data/glusterfs/safari/brick00/brick
> > Brick2: server1:/data/glusterfs/safari/brick01/brick
> > Brick3: server2:/data/glusterfs/safari/brick02/brick
> > Brick4: server2:/data/glusterfs/safari/brick03/brick
> > Brick5: server2:/data/glusterfs/safari/brick04/brick
> >
> >
> > Size information:
> > /dev/sdc 37T 16T 22T 42% /data/glusterfs/safari/brick02
> > /dev/sdd 37T 16T 22T 42% /data/glusterfs/safari/brick03
> > /dev/sde 37T 17T 21T 45% /data/glusterfs/safari/brick04
> > /dev/md126 11T 7.7T 2.8T 74% /data/glusterfs/safari/brick00
> > /dev/md124 11T 8.0T 2.5T 77% /data/glusterfs/safari/brick01
> > server2:/safari 130T 63T 68T 48% /sar
> >
> >
> > Example 1:
> > -Two files with the same name exist in one directory
> > -They have different contents and attributes
> > -A file listing on the mounted volume shows the same inode
> > -The newer file has sticky bit set
>> -Neither file is corrupted, they can both be viewed by using the
>absolute
> > path (on the bricks)
> >
> > File listing on the mounted volume
>> 13036730497538635177 -rw-rw-r-T 1 jon users 924 Dec 15 10:42 RSLC_tab
> > 13036730497538635177 -rw-rw-r-- 1 jon users 418 Mar 18 2013 RSLC_tab
> >
> > Listing of the files on the bricks:
> > 8925798411 -rw-rw-r-T+ 2 jon users 924 Dec 15 10:42
>>
>/data/glusterfs/safari/brick00/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab
> > 51541886672 -rw-rw-r--+ 2 1002 users 418 Mar 18 2013
>>
>/data/glusterfs/safari/brick02/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab
> >
> >
> > Example 2:
> > -Two files with the same name exist in one directory
> > -They have the same content and attributes
>> -No sticky bit is set when looking at file listing on the mounted
>volume
>> -Sticky bit is set for one while when looking at file listing on the
>bricks
> > -Files are corrupted
> >
> > File listing on the mounted volume:
> > 13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013
> > ifg_lr/20130226_20130813.diff.phi.ras
> > 13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013
> > ifg_lr/20130226_20130813.diff.phi.ras
> >
> > Listing of the files on the bricks:
> > 17058578 -rw-rw-r-T+ 2 tom users 2393848 Dec 13 17:11
>>
>/data/glusterfs/safari/brick00/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras
> > 57986922129 -rw-rw-r--+ 2 1010 users 2393848 Dec 8 2013
>>
>/data/glusterfs/safari/brick02/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras
> >
> >
> > Additionally, only some files in this directory are duplicated. The
>> duplicated files are corrupted (can not be viewed as Raster images:
>the
> > original file type)
> > The files which are not duplicated are not corrupted.
> >
> > File command: (notice duplicate and singleton files)
>> ifg_lr/20091021_20100218.diff.phi.ras: Sun raster image data, 1208 x
>1981,
> > 8-bit, RGB colormap
> > ifg_lr/20091021_20101016.diff.phi.ras: data
> > ifg_lr/20091021_20101016.diff.phi.ras: data
>> ifg_lr/20091021_20101109.diff.phi.ras: Sun raster image data, 1208 x
>1981,
> > 8-bit, RGB colormap
>> ifg_lr/20091021_20101203.diff.phi.ras: Sun raster image data, 1208 x
>1981,
> > 8-bit, RGB colormap
>> ifg_lr/20091021_20101227.diff.phi.ras: Sun raster image data, 1208 x
>1981,
> > 8-bit, RGB colormap
>> ifg_lr/20091021_20110120.diff.phi.ras: Sun raster image data, 1208 x
>1981,
> > 8-bit, RGB colormap
> > ifg_lr/20091021_20110213.diff.phi.ras: data
> > ifg_lr/20091021_20110213.diff.phi.ras: data
> > ifg_lr/20091021_20110309.diff.phi.ras: data
> > ifg_lr/20091021_20110309.diff.phi.ras: sticky data
>> ifg_lr/20091021_20110402.diff.phi.ras: Sun raster image data, 1208 x
>1981,
> > 8-bit, RGB colormap
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>_______________________________________________ Gluster-users mailing
>list Gluster-users at gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20141221/e660cd6f/attachment.html>
More information about the Gluster-users
mailing list