[Gluster-devel] rc8

Thu Apr 23 05:24:19 UTC 2009

Ender,
  Please try the latest git. We did find an issue with subdirs getting
skipped while syncing.

Avati

On Thu, Apr 23, 2009 at 3:24 AM, ender <ender at enderzone.com> wrote:
> Closer, but still no cigar..
>
> all nodes: killall glusterfsd; killall glusterfs;
> all nodes: rm -rf /tank/*
> all nodes: glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
> all nodes: mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank
> node3:~# cp -R gluster /gtank/gluster1
> *simulating a hardware failure
> node1:~# killall glusterfsd ; killall glusterfs;
> node1:~# killall glusterfsd ; killall glusterfs;
> glusterfsd: no process killed
> glusterfs: no process killed
> node1:~# rm -rf /tank/*
> *data never stops changing, just because we have a failed node
> node3:~# cp -R gluster /gtank/gluster2
> all nodes but node1:~# ls -lR /gtank/ | wc -l
> 2782
> all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l
> 1393
> all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l
> 1393
> *Adding hardware back into the network after replacing bad harddrive(s)
> node1:~# glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
> node1:~# mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank
> node3:~# ls -lR /gtank/ | wc -l
> 1802
> node3:~# ls -lR /gtank/gluster1 | wc -l
> 413
> node3:~# ls -lR /gtank/gluster2 | wc -l
> 1393
>
> Are you aware that taking the broken node1 out fixes the gluster system
> again?
> node1:~# killall glusterfsd ; killall glusterfs;
> node1:~# killall glusterfsd ; killall glusterfs;
> glusterfsd: no process killed
> glusterfs: no process killed
> all nodes but node1:~# ls -lR /gtank/ | wc -l
> 2782
> all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l
> 1393
> all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l
> 1393
>
> Add it back in
> node3:~# ls -lR /gtank/gluster1 | wc -l
> 413
>
> And its broken again.
>
>
> Thank you for working on gluster, and for the response!
> Anand Avati wrote:
>>
>> Ender,
>>  There was a bug fix which went in to git today which fixes a similar
>> bug.. a case where a subset of the files would be recreated if there
>> are a lot of files (~1000 or more) when the node which was down was
>> the first subvolume in the list. Please pull the latest patches and
>> see if it solves your case. Thank you for your patience!
>>
>> Avati
>>
>> On Thu, Apr 23, 2009 at 2:29 AM, ender <ender at enderzone.com> wrote:
>>>
>>> I was just wondering if the self heal bug is planned to be fixed, or if
>>> they
>>> developers are just ignoring it in hopes it will go away? Everytime i ask
>>> someone privately if they can reproduce the problem on there own end,
>>> they
>>> go silent. (which leads me to believe that they in fact can reproduce it)
>>>
>>> Very simple, AFR. As many subvolumes as you want. The first listed
>>> subvolume
>>> will always break the self heal. node2 and node3 always heal fine. Swap
>>> the
>>> ip address of the first listed subvolume and you will swap the box which
>>> breaks the selfheal. I have been able to repeat this bug every day with
>>> the
>>> newest git for the last month.
>>> Please let us know if this is not considered a bug, or acknowledge it in
>>> some fashion. Thank you.
>>> same configs
>>> all nodes: killall glusterfsd; killall glusterfs;
>>> all nodes: rm -rf /tank/*
>>> all nodes: glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
>>> all nodes: mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol
>>> /gtank
>>> node3:~# cp -R gluster /gtank/gluster1
>>> *simulating a hardware failure
>>> node1:~# killall glusterfsd ; killall glusterfs;
>>> node1:~# killall glusterfsd ; killall glusterfs;
>>> glusterfsd: no process killed
>>> glusterfs: no process killed
>>> node1:~# rm -rf /tank/*
>>> *data never stops changing, just because we have a failed node
>>> node3:~# cp -R gluster /gtank/gluster2
>>> all nodes but node1:~# ls -lR /gtank/ | wc -l
>>> 2780
>>> all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l
>>> 1387
>>> all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l
>>> 1387
>>> *Adding hardware back into the network after replacing bad harddrive(s)
>>> node1:~# glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
>>> node1:~# mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank
>>> node3:~# ls -lR /gtank/ | wc -l
>>> 1664
>>> node3:~# ls -lR /gtank/gluster1 | wc -l
>>> 271
>>> node3:~# ls -lR /gtank/gluster2 | wc -l
>>> 1387
>>>
>>>
>>> ### Export volume "brick" with the contents of "/tank" directory.
>>> volume posix
>>> type storage/posix                     # POSIX FS translator
>>> option directory /tank                 # Export this directory
>>> end-volume
>>>
>>> volume locks
>>>  type features/locks
>>>  subvolumes posix
>>> end-volume
>>>
>>> volume brick
>>> type performance/io-threads
>>> subvolumes locks
>>> end-volume
>>>
>>> ### Add network serving capability to above brick.
>>> volume server
>>>  type protocol/server
>>>  option transport-type tcp
>>>  subvolumes brick
>>>  option auth.addr.brick.allow *        # Allow access to "brick" volume
>>>  option client-volume-filename /usr/local/etc/glusterfs/glusterfs.vol
>>> end-volume
>>>
>>>
>>> #
>>> #mirror block0
>>> #
>>> volume node1
>>>  type protocol/client
>>>  option transport-type tcp
>>>  option remote-host node1.ip                                   # IP
>>> address
>>> of the remote brick
>>> # option transport-timeout 30                                   # seconds
>>> to
>>> wait for a reply from server for each request
>>>  option remote-subvolume brick                                 # name of
>>> the
>>> remote volume
>>> end-volume
>>> volume node2
>>>  type protocol/client
>>>  option transport-type tcp
>>>  option remote-host node2.ip                                   # IP
>>> address
>>> of the remote brick
>>> # option transport-timeout 30                                   # seconds
>>> to
>>> wait for a reply from server for each request
>>>  option remote-subvolume brick                                 # name of
>>> the
>>> remote volume
>>> end-volume
>>> volume node3
>>>  type protocol/client
>>>  option transport-type tcp
>>>  option remote-host node3.ip                                   # IP
>>> address
>>> of the remote brick
>>> # option transport-timeout 30                                   # seconds
>>> to
>>> wait for a reply from server for each request
>>>  option remote-subvolume brick                                 # name of
>>> the
>>> remote volume
>>> end-volume
>>>
>>> volume mirrorblock0
>>>  type cluster/replicate
>>>  subvolumes node1 node2 node3
>>>  option metadata-self-heal yes
>>> end-volume
>>>
>>>
>>>
>>>
>>> Gordan Bobic wrote:
>>>>
>>>> First-access failing bug still seems to be present.
>>>> But other than that, it seems to be distinctly better than rc4. :)
>>>> Good work! :)
>>>>
>>>> Gordan
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at nongnu.org
>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at nongnu.org
>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>