[Gluster-devel] rc8

Wed Apr 22 21:29:01 UTC 2009

Ender,
  There was a bug fix which went in to git today which fixes a similar
bug.. a case where a subset of the files would be recreated if there
are a lot of files (~1000 or more) when the node which was down was
the first subvolume in the list. Please pull the latest patches and
see if it solves your case. Thank you for your patience!

Avati

On Thu, Apr 23, 2009 at 2:29 AM, ender <ender at enderzone.com> wrote:
> I was just wondering if the self heal bug is planned to be fixed, or if they
> developers are just ignoring it in hopes it will go away? Everytime i ask
> someone privately if they can reproduce the problem on there own end, they
> go silent. (which leads me to believe that they in fact can reproduce it)
>
> Very simple, AFR. As many subvolumes as you want. The first listed subvolume
> will always break the self heal. node2 and node3 always heal fine. Swap the
> ip address of the first listed subvolume and you will swap the box which
> breaks the selfheal. I have been able to repeat this bug every day with the
> newest git for the last month.
> Please let us know if this is not considered a bug, or acknowledge it in
> some fashion. Thank you.
> same configs
> all nodes: killall glusterfsd; killall glusterfs;
> all nodes: rm -rf /tank/*
> all nodes: glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
> all nodes: mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank
> node3:~# cp -R gluster /gtank/gluster1
> *simulating a hardware failure
> node1:~# killall glusterfsd ; killall glusterfs;
> node1:~# killall glusterfsd ; killall glusterfs;
> glusterfsd: no process killed
> glusterfs: no process killed
> node1:~# rm -rf /tank/*
> *data never stops changing, just because we have a failed node
> node3:~# cp -R gluster /gtank/gluster2
> all nodes but node1:~# ls -lR /gtank/ | wc -l
> 2780
> all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l
> 1387
> all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l
> 1387
> *Adding hardware back into the network after replacing bad harddrive(s)
> node1:~# glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
> node1:~# mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank
> node3:~# ls -lR /gtank/ | wc -l
> 1664
> node3:~# ls -lR /gtank/gluster1 | wc -l
> 271
> node3:~# ls -lR /gtank/gluster2 | wc -l
> 1387
>
>
> ### Export volume "brick" with the contents of "/tank" directory.
> volume posix
> type storage/posix                     # POSIX FS translator
> option directory /tank                 # Export this directory
> end-volume
>
> volume locks
>  type features/locks
>  subvolumes posix
> end-volume
>
> volume brick
> type performance/io-threads
> subvolumes locks
> end-volume
>
> ### Add network serving capability to above brick.
> volume server
>  type protocol/server
>  option transport-type tcp
>  subvolumes brick
>  option auth.addr.brick.allow *        # Allow access to "brick" volume
>  option client-volume-filename /usr/local/etc/glusterfs/glusterfs.vol
> end-volume
>
>
> #
> #mirror block0
> #
> volume node1
>  type protocol/client
>  option transport-type tcp
>  option remote-host node1.ip                                   # IP address
> of the remote brick
> # option transport-timeout 30                                   # seconds to
> wait for a reply from server for each request
>  option remote-subvolume brick                                 # name of the
> remote volume
> end-volume
> volume node2
>  type protocol/client
>  option transport-type tcp
>  option remote-host node2.ip                                   # IP address
> of the remote brick
> # option transport-timeout 30                                   # seconds to
> wait for a reply from server for each request
>  option remote-subvolume brick                                 # name of the
> remote volume
> end-volume
> volume node3
>  type protocol/client
>  option transport-type tcp
>  option remote-host node3.ip                                   # IP address
> of the remote brick
> # option transport-timeout 30                                   # seconds to
> wait for a reply from server for each request
>  option remote-subvolume brick                                 # name of the
> remote volume
> end-volume
>
> volume mirrorblock0
>  type cluster/replicate
>  subvolumes node1 node2 node3
>  option metadata-self-heal yes
> end-volume
>
>
>
>
> Gordan Bobic wrote:
>>
>> First-access failing bug still seems to be present.
>> But other than that, it seems to be distinctly better than rc4. :)
>> Good work! :)
>>
>> Gordan
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>