replicate/distribute oddities in 2.0.0 Was Re: [Gluster-devel] rc8

Liam Slusser lslusser at gmail.com
Tue May 12 01:29:27 UTC 2009


Even with manually fixing (adding or removing) the extended attributes i was
never able to get Gluster to see the missing files.  So i ended up writing a
quick program that searched the raw bricks filesystem and then checked to
make sure the file existed in the Gluster cluster and if it didn't it would
tag the file.  Once that job was done i shut down Gluster, moved all the
missing files off the raw bricks into temp storage, and then i restarted
Gluster and copied all the files back into each directory.  That fixed the
missing file problems.

Id still like to find out why Gluster would ignore certain files without the
correct attributes.  Even removing all the file attributes wouldn't fix the
problem.  I also tried manually coping a file into a brick which it still
wouldn't find.  It would be nice to be able to manual copy files into a
brick, then set an extended attribute flag which would cause gluster to see
the new file(s) and copy them to all bricks after a ls -alR was done.  Or
even better just do it automatically when new files without attributes are
found in a brick.

thanks,
liam


On Wed, May 6, 2009 at 4:13 PM, Liam Slusser <lslusser at gmail.com> wrote:

>
> To answer some of my own question, looks like those files were copied using
> gluster 1.3.12 which is why they have the different extended attributes:
> gluster 1.3.12
>
> Attribute "glusterfs.createtime" has a 10 byte value for file
> Attribute "glusterfs.version" has a 1 byte value for file
> Attribute "glusterfs.dht" has a 16 byte value for file
>
> while gluster 2.0.0 has
>
> Attribute "afr.brick2b" has a 12 byte value for file
> Attribute "afr.brick1b" has a 12 byte value for file
>
> I've been unsuccessful on fixing the attributes, can anybody point me in
> the right direction?
>
> thanks,
> liam
>
> On Wed, May 6, 2009 at 12:48 PM, Liam Slusser <lslusser at gmail.com> wrote:
>
>>  Big thanks to the devel group for fixing all the memory leak issues with
>> the earlier RC releases.  2.0.0 has been great so far without any memory
>> issues what-so-ever.
>> I am seeing some oddities with the replication/distribute translators
>> however.  I have three partitions on each gluster server exporting three
>> bricks - We have two servers.  The gluster clients replicates each brick
>> between the two servers and then i have a distribute translator for all the
>> replicated bricks - basically gluster raid10.
>>
>> There are a handful of files which have been copied into the gluster
>> volume but since have disappeared, however the physical files exist on both
>> bricks.
>>
>> (from a client)
>>
>> [root at client1 049891002526]# pwd
>> /intstore/data/tracks/tmg/2008_02_05/049891002526
>> [root at client1 049891002526]# ls -al 049891002526_01_09.wma.sigKey01.k
>> ls: 049891002526_01_09.wma.sigKey01.k: No such file or directory
>> [root at client1 049891002526]# head 049891002526_01_09.wma.sigKey01.k
>> head: cannot open `049891002526_01_09.wma.sigKey01.k' for reading: No such
>> file or directory
>> [root at client1 049891002526]#
>>
>>
>> (from a server brick)
>>
>> [root at server1 049891002526]# pwd
>> /intstore/intstore01c/gcdata/data/tracks/tmg/2008_02_05/049891002526
>> [root at server1 049891002526]# ls -al 049891002526_01_09.wma.sigKey01.k
>> -rw-rw-rw- 1 10015 root 19377712 Feb  6  2008
>> 049891002526_01_09.wma.sigKey01.k
>> [root at server1 049891002526]# attr -l 049891002526_01_09.wma.sigKey01.k
>> Attribute "glusterfs.createtime" has a 10 byte value for
>> 049891002526_01_09.wma.sigKey01.k
>> Attribute "glusterfs.version" has a 1 byte value for
>> 049891002526_01_09.wma.sigKey01.k
>> Attribute "selinux" has a 24 byte value for
>> 049891002526_01_09.wma.sigKey01.k
>> [root at server1 049891002526]# attr -l .
>> Attribute "glusterfs.createtime" has a 10 byte value for .
>> Attribute "glusterfs.version" has a 1 byte value for .
>> Attribute "glusterfs.dht" has a 16 byte value for .
>> Attribute "selinux" has a 24 byte value for .
>>
>>
>> Nothing in both the client and server logs.  I've tried all the normal
>> replication checks and self-heal such as ls -alR.  If i copy the file back
>> from one of the bricks into the volume it will show up again however it has
>> a 1/3 chance of getting written to the files original location.  So then i
>> end up with two identical files on two different bricks.
>>
>> This volume has over 40 million files and directories so it can be very
>> tedious to find anomalies. I wrote a quick perl script to search 1/25 of
>> our total files in the volume for missing files and md5 checksum differences
>> and as of now its about 15% (138,500 files) complete and has found ~7000
>> missing files and 0 md5 checksum differences.
>>
>> How could i debug this?  I'd image it has something to do with the
>> extended attributes on either the file or parent directory...but as far as i
>> can tell that all looks fine.
>>
>> thanks,
>> liam
>>
>> client glusterfs.vol:
>>
>> volume brick1a
>>   type protocol/client
>>   option transport-type tcp
>>   option remote-host server1
>>   option remote-subvolume brick1a
>> end-volume
>>
>> volume brick1b
>>   type protocol/client
>>   option transport-type tcp
>>   option remote-host server1
>>   option remote-subvolume brick1b
>> end-volume
>>
>> volume brick1c
>>   type protocol/client
>>   option transport-type tcp
>>   option remote-host server1
>>   option remote-subvolume brick1c
>> end-volume
>>
>> volume brick2a
>>   type protocol/client
>>   option transport-type tcp
>>   option remote-host server2
>>   option remote-subvolume brick2a
>> end-volume
>>
>> volume brick2b
>>   type protocol/client
>>   option transport-type tcp
>>   option remote-host server2
>>   option remote-subvolume brick2b
>> end-volume
>>
>> volume brick2c
>>   type protocol/client
>>   option transport-type tcp
>>   option remote-host server2
>>   option remote-subvolume brick2c
>> end-volume
>>
>>  volume bricks1
>>   type cluster/replicate
>>   subvolumes brick1a brick2a
>> end-volume
>>
>> volume bricks2
>>   type cluster/replicate
>>   subvolumes brick1b brick2b
>> end-volume
>>
>> volume bricks3
>>   type cluster/replicate
>>   subvolumes brick1c brick2c
>> end-volume
>>
>> volume distribute
>>   type cluster/distribute
>>   subvolumes bricks1 bricks2 bricks3
>> end-volume
>>
>> volume writebehind
>>   type performance/write-behind
>>   option block-size 1MB
>>   option cache-size 64MB
>>   option flush-behind on
>>   subvolumes distribute
>> end-volume
>>
>> volume cache
>>   type performance/io-cache
>>   option cache-size 2048MB
>>   subvolumes writebehind
>> end-volume
>>
>> server glusterfsd.vol:
>>
>> volume intstore01a
>>   type storage/posix
>>   option directory /intstore/intstore01a/gcdata
>> end-volume
>>
>> volume intstore01b
>>   type storage/posix
>>   option directory /intstore/intstore01b/gcdata
>> end-volume
>>
>> volume intstore01c
>>   type storage/posix
>>   option directory /intstore/intstore01c/gcdata
>> end-volume
>>
>> volume locksa
>>   type features/posix-locks
>>   option mandatory-locks on
>>   subvolumes intstore01a
>> end-volume
>>
>> volume locksb
>>   type features/posix-locks
>>   option mandatory-locks on
>>   subvolumes intstore01b
>> end-volume
>>
>> volume locksc
>>   type features/posix-locks
>>   option mandatory-locks on
>>   subvolumes intstore01c
>> end-volume
>>
>> volume brick1a
>>   type performance/io-threads
>>   option thread-count 32
>>   subvolumes locksa
>> end-volume
>>
>> volume brick1b
>>   type performance/io-threads
>>   option thread-count 32
>>   subvolumes locksb
>> end-volume
>>
>> volume brick1c
>>   type performance/io-threads
>>   option thread-count 32
>>   subvolumes locksc
>> end-volume
>>
>> volume server
>>   type protocol/server
>>   option transport-type tcp
>>   option auth.addr.brick1a.allow 192.168.12.*
>>   option auth.addr.brick1b.allow 192.168.12.*
>>   option auth.addr.brick1c.allow 192.168.12.*
>>   subvolumes brick1a brick1b brick1c
>> end-volume
>>
>>
>> On Wed, Apr 22, 2009 at 5:43 PM, Liam Slusser <lslusser at gmail.com> wrote:
>>
>>>
>>> Avati,
>>> Big thanks.  Looks like that did the trick.  I'll report back in the
>>> morning if anything has changed but its looking MUCH better.  Thanks again!
>>>
>>> liam
>>>
>>> On Wed, Apr 22, 2009 at 2:32 PM, Anand Avati <avati at gluster.com> wrote:
>>>
>>>> Liam,
>>>>  An fd leak and a lock structure leak has been fixed in the git
>>>> repository, which explains a leak in the first subvolume's server.
>>>> Please pull the latest patches and let us know if it does not fixe
>>>> your issues. Thanks!
>>>>
>>>> Avati
>>>>
>>>> On Tue, Apr 21, 2009 at 3:41 PM, Liam Slusser <lslusser at gmail.com>
>>>> wrote:
>>>> > There is still a memory leak with rc8 on my setup.  The first server
>>>> in a
>>>> > cluster or two servers starts out using 18M and just slowly increases.
>>>> >  After 30mins it has doubled in size to over 30M and just keeps
>>>> growing -
>>>> > the more memory it uses the worst the performance.  Funny that the
>>>> second
>>>> > server in my cluster using the same configuration file has no such
>>>> memory
>>>> > problem.
>>>> > My glusterfsd.vol has no performance translators, just 3 storage/posix
>>>> -> 3
>>>> > features/posix-locks -> protocol/server.
>>>> > thanks,
>>>> > liam
>>>> > On Mon, Apr 20, 2009 at 2:01 PM, Gordan Bobic <gordan at bobich.net>
>>>> wrote:
>>>> >>
>>>> >> Gordan Bobic wrote:
>>>> >>>
>>>> >>> First-access failing bug still seems to be present.
>>>> >>> But other than that, it seems to be distinctly better than rc4. :)
>>>> >>> Good work! :)
>>>> >>
>>>> >> And that massive memory leak is gone, too! The process hasn't grown
>>>> by a
>>>> >> KB after a kernel compile! :D
>>>> >>
>>>> >> s/Good work/Awesome work/
>>>> >>
>>>> >> :)
>>>> >>
>>>> >>
>>>> >> Gordan
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> Gluster-devel mailing list
>>>> >> Gluster-devel at nongnu.org
>>>> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Gluster-devel mailing list
>>>> > Gluster-devel at nongnu.org
>>>> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20090511/c30a7f64/attachment-0003.html>


More information about the Gluster-devel mailing list