replicate/distribute oddities in 2.0.0 Was Re: [Gluster-devel] rc8

Wed May 6 23:13:11 UTC 2009

To answer some of my own question, looks like those files were copied using
gluster 1.3.12 which is why they have the different extended attributes:
gluster 1.3.12

Attribute "glusterfs.createtime" has a 10 byte value for file
Attribute "glusterfs.version" has a 1 byte value for file
Attribute "glusterfs.dht" has a 16 byte value for file

while gluster 2.0.0 has

Attribute "afr.brick2b" has a 12 byte value for file
Attribute "afr.brick1b" has a 12 byte value for file

I've been unsuccessful on fixing the attributes, can anybody point me in the
right direction?

thanks,
liam

On Wed, May 6, 2009 at 12:48 PM, Liam Slusser <lslusser at gmail.com> wrote:

>  Big thanks to the devel group for fixing all the memory leak issues with
> the earlier RC releases.  2.0.0 has been great so far without any memory
> issues what-so-ever.
> I am seeing some oddities with the replication/distribute translators
> however.  I have three partitions on each gluster server exporting three
> bricks - We have two servers.  The gluster clients replicates each brick
> between the two servers and then i have a distribute translator for all the
> replicated bricks - basically gluster raid10.
>
> There are a handful of files which have been copied into the gluster volume
> but since have disappeared, however the physical files exist on both bricks.
>
> (from a client)
>
> [root at client1 049891002526]# pwd
> /intstore/data/tracks/tmg/2008_02_05/049891002526
> [root at client1 049891002526]# ls -al 049891002526_01_09.wma.sigKey01.k
> ls: 049891002526_01_09.wma.sigKey01.k: No such file or directory
> [root at client1 049891002526]# head 049891002526_01_09.wma.sigKey01.k
> head: cannot open `049891002526_01_09.wma.sigKey01.k' for reading: No such
> file or directory
> [root at client1 049891002526]#
>
>
> (from a server brick)
>
> [root at server1 049891002526]# pwd
> /intstore/intstore01c/gcdata/data/tracks/tmg/2008_02_05/049891002526
> [root at server1 049891002526]# ls -al 049891002526_01_09.wma.sigKey01.k
> -rw-rw-rw- 1 10015 root 19377712 Feb  6  2008
> 049891002526_01_09.wma.sigKey01.k
> [root at server1 049891002526]# attr -l 049891002526_01_09.wma.sigKey01.k
> Attribute "glusterfs.createtime" has a 10 byte value for
> 049891002526_01_09.wma.sigKey01.k
> Attribute "glusterfs.version" has a 1 byte value for
> 049891002526_01_09.wma.sigKey01.k
> Attribute "selinux" has a 24 byte value for
> 049891002526_01_09.wma.sigKey01.k
> [root at server1 049891002526]# attr -l .
> Attribute "glusterfs.createtime" has a 10 byte value for .
> Attribute "glusterfs.version" has a 1 byte value for .
> Attribute "glusterfs.dht" has a 16 byte value for .
> Attribute "selinux" has a 24 byte value for .
>
>
> Nothing in both the client and server logs.  I've tried all the normal
> replication checks and self-heal such as ls -alR.  If i copy the file back
> from one of the bricks into the volume it will show up again however it has
> a 1/3 chance of getting written to the files original location.  So then i
> end up with two identical files on two different bricks.
>
> This volume has over 40 million files and directories so it can be very
> tedious to find anomalies. I wrote a quick perl script to search 1/25 of
> our total files in the volume for missing files and md5 checksum differences
> and as of now its about 15% (138,500 files) complete and has found ~7000
> missing files and 0 md5 checksum differences.
>
> How could i debug this?  I'd image it has something to do with the
> extended attributes on either the file or parent directory...but as far as i
> can tell that all looks fine.
>
> thanks,
> liam
>
> client glusterfs.vol:
>
> volume brick1a
>   type protocol/client
>   option transport-type tcp
>   option remote-host server1
>   option remote-subvolume brick1a
> end-volume
>
> volume brick1b
>   type protocol/client
>   option transport-type tcp
>   option remote-host server1
>   option remote-subvolume brick1b
> end-volume
>
> volume brick1c
>   type protocol/client
>   option transport-type tcp
>   option remote-host server1
>   option remote-subvolume brick1c
> end-volume
>
> volume brick2a
>   type protocol/client
>   option transport-type tcp
>   option remote-host server2
>   option remote-subvolume brick2a
> end-volume
>
> volume brick2b
>   type protocol/client
>   option transport-type tcp
>   option remote-host server2
>   option remote-subvolume brick2b
> end-volume
>
> volume brick2c
>   type protocol/client
>   option transport-type tcp
>   option remote-host server2
>   option remote-subvolume brick2c
> end-volume
>
> volume bricks1
>   type cluster/replicate
>   subvolumes brick1a brick2a
> end-volume
>
> volume bricks2
>   type cluster/replicate
>   subvolumes brick1b brick2b
> end-volume
>
> volume bricks3
>   type cluster/replicate
>   subvolumes brick1c brick2c
> end-volume
>
> volume distribute
>   type cluster/distribute
>   subvolumes bricks1 bricks2 bricks3
> end-volume
>
> volume writebehind
>   type performance/write-behind
>   option block-size 1MB
>   option cache-size 64MB
>   option flush-behind on
>   subvolumes distribute
> end-volume
>
> volume cache
>   type performance/io-cache
>   option cache-size 2048MB
>   subvolumes writebehind
> end-volume
>
> server glusterfsd.vol:
>
> volume intstore01a
>   type storage/posix
>   option directory /intstore/intstore01a/gcdata
> end-volume
>
> volume intstore01b
>   type storage/posix
>   option directory /intstore/intstore01b/gcdata
> end-volume
>
> volume intstore01c
>   type storage/posix
>   option directory /intstore/intstore01c/gcdata
> end-volume
>
> volume locksa
>   type features/posix-locks
>   option mandatory-locks on
>   subvolumes intstore01a
> end-volume
>
> volume locksb
>   type features/posix-locks
>   option mandatory-locks on
>   subvolumes intstore01b
> end-volume
>
> volume locksc
>   type features/posix-locks
>   option mandatory-locks on
>   subvolumes intstore01c
> end-volume
>
> volume brick1a
>   type performance/io-threads
>   option thread-count 32
>   subvolumes locksa
> end-volume
>
> volume brick1b
>   type performance/io-threads
>   option thread-count 32
>   subvolumes locksb
> end-volume
>
> volume brick1c
>   type performance/io-threads
>   option thread-count 32
>   subvolumes locksc
> end-volume
>
> volume server
>   type protocol/server
>   option transport-type tcp
>   option auth.addr.brick1a.allow 192.168.12.*
>   option auth.addr.brick1b.allow 192.168.12.*
>   option auth.addr.brick1c.allow 192.168.12.*
>   subvolumes brick1a brick1b brick1c
> end-volume
>
>
> On Wed, Apr 22, 2009 at 5:43 PM, Liam Slusser <lslusser at gmail.com> wrote:
>
>>
>> Avati,
>> Big thanks.  Looks like that did the trick.  I'll report back in the
>> morning if anything has changed but its looking MUCH better.  Thanks again!
>>
>> liam
>>
>> On Wed, Apr 22, 2009 at 2:32 PM, Anand Avati <avati at gluster.com> wrote:
>>
>>> Liam,
>>>  An fd leak and a lock structure leak has been fixed in the git
>>> repository, which explains a leak in the first subvolume's server.
>>> Please pull the latest patches and let us know if it does not fixe
>>> your issues. Thanks!
>>>
>>> Avati
>>>
>>> On Tue, Apr 21, 2009 at 3:41 PM, Liam Slusser <lslusser at gmail.com>
>>> wrote:
>>> > There is still a memory leak with rc8 on my setup.  The first server in
>>> a
>>> > cluster or two servers starts out using 18M and just slowly increases.
>>> >  After 30mins it has doubled in size to over 30M and just keeps growing
>>> -
>>> > the more memory it uses the worst the performance.  Funny that the
>>> second
>>> > server in my cluster using the same configuration file has no such
>>> memory
>>> > problem.
>>> > My glusterfsd.vol has no performance translators, just 3 storage/posix
>>> -> 3
>>> > features/posix-locks -> protocol/server.
>>> > thanks,
>>> > liam
>>> > On Mon, Apr 20, 2009 at 2:01 PM, Gordan Bobic <gordan at bobich.net>
>>> wrote:
>>> >>
>>> >> Gordan Bobic wrote:
>>> >>>
>>> >>> First-access failing bug still seems to be present.
>>> >>> But other than that, it seems to be distinctly better than rc4. :)
>>> >>> Good work! :)
>>> >>
>>> >> And that massive memory leak is gone, too! The process hasn't grown by
>>> a
>>> >> KB after a kernel compile! :D
>>> >>
>>> >> s/Good work/Awesome work/
>>> >>
>>> >> :)
>>> >>
>>> >>
>>> >> Gordan
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Gluster-devel mailing list
>>> >> Gluster-devel at nongnu.org
>>> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>> >
>>> >
>>> > _______________________________________________
>>> > Gluster-devel mailing list
>>> > Gluster-devel at nongnu.org
>>> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>> >
>>> >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20090506/4bd0719a/attachment-0003.html>