[Gluster-users] Self-Heal Daemon not Running

Wed Sep 25 08:28:22 UTC 2013

On 09/25/2013 01:06 PM, Andrew Lau wrote:
> On Wed, Sep 25, 2013 at 2:28 PM, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>>wrote:
>
>     On 09/25/2013 06:16 AM, Andrew Lau wrote:
>>     That's where I found the 200+ entries
>>
>>     [ root at hv01 ]gluster volume heal STORAGE info split-brain
>>     Gathering Heal info on volume STORAGE has been successful
>>
>>     Brick hv01:/data1
>>     Number of entries: 271
>>     at            path on brick
>>
>>     2013-09-25 00:04:29 /6682d31f-39ce-4896-99ef-14e1c9682585/dom_md/ids
>>     2013-09-25 00:04:29
>>     /6682d31f-39ce-4896-99ef-14e1c9682585/images/5599c7c7-0c25-459a-9d7d-80190a7c739b/0593d351-2ab1-49cd-a9b6-c94c897ebcc7
>>     2013-09-24 23:54:29 <gfid:9c83f7e4-6982-4477-816b-172e4e640566>
>>     2013-09-24 23:54:29 <gfid:91e98909-c217-417b-a3c1-4cf0f2356e14>
>>     <snip>
>>
>>     Brick hv02:/data1
>>     Number of entries: 0
>>
>>     When I run the same command on hv02, it will show the reverse
>>     (the other node having 0 entries).
>>
>>     I remember last time having to delete these files individually on
>>     another split-brain case, but I was hoping there was a better
>>     solution than going through 200+ entries.
>>
>     While I haven't tried it out myself, Jeff Darcy has written a
>     script
>     (https://github.com/jdarcy/glusterfs/tree/heal-script/extras/heal_script)
>     which helps in automating the process. He has detailed it's usage
>     in his blog post
>     http://hekafs.org/index.php/2012/06/healing-split-brain/
>
>     Hope this helps.
>     -Ravi
>
>
> That didn't end up working, ImportError: No module named volfilter
>
Oh, you need to download all 4 python scripts in the heal_script folder.
> But I didn't end up spending much time with it as the number of 
> entries magically reduced to 10, I removed the files and the 
> split-brain info reports 0 entries. Still wondering why there's 
> different file sizes on the two bricks.
>
>
>>     Cheers.
>>
>>
>>     On Wed, Sep 25, 2013 at 10:39 AM, Mohit Anchlia
>>     <mohitanchlia at gmail.com <mailto:mohitanchlia at gmail.com>> wrote:
>>
>>         What's the output of
>>         |gluster volume heal $VOLUME info ||split||-brain|
>>
>>
>>         On Tue, Sep 24, 2013 at 5:33 PM, Andrew Lau
>>         <andrew at andrewklau.com <mailto:andrew at andrewklau.com>> wrote:
>>
>>             Found the BZ
>>             https://bugzilla.redhat.com/show_bug.cgi?id=960190 - so I
>>             restarted one of the volumes and it seems to have
>>             restarted the all daemons again.
>>
>>             Self heal started again, but I seem to have split-brain
>>             issues everywhere. There's over 100 different entries on
>>             each node, what's the best way to restore this now? Short
>>             of having to manually go through and delete 200+ files.
>>             It looks like a full split brain as the file sizes on the
>>             different nodes are out of balance by about 100GB or so.
>>
>>             Any suggestions would be much appreciated!
>>
>>             Cheers.
>>
>>             On Tue, Sep 24, 2013 at 10:32 PM, Andrew Lau
>>             <andrew at andrewklau.com <mailto:andrew at andrewklau.com>> wrote:
>>
>>                 Hi,
>>
>>                 Right now, I have a 2x1 replica. Ever since I had to
>>                 reinstall one of the gluster servers, there's been
>>                 issues with split-brain. The self-heal daemon doesn't
>>                 seem to be running on either of the nodes.
>>
>>                 To reinstall the gluster server (the original brick
>>                 data was intact but the OS had to be reinstalled)
>>                 - Reinstalled gluster
>>                 - Copied over the old uuid from backup
>>                 - gluster peer probe
>>                 - gluster volume sync $othernode all
>>                 - mount -t glusterfs localhost:STORAGE /mnt
>>                 - find /mnt -noleaf -print0 | xargs --null stat
>>                 >/dev/null 2>/var/log/glusterfs/mnt-selfheal.log
>>
>>                 I let it resync and it was working fine, atleast so I
>>                 thought. I just came back a few days later to see
>>                 there's a miss match in the brick volumes. One is
>>                 50GB ahead of the other.
>>
>>                 # gluster volume heal STORAGE info
>>                 Status: self-heal-daemon is not running on
>>                 966456a1-b8a6-4ca8-9da7-d0eb96997cbe
>>
>>                 /var/log/gluster/glustershd.log doesn't seem to have
>>                 any recent logs, only those from when the two
>>                 original gluster servers were running.
>>
>>                 # gluster volume status
>>
>>                 Self-heal Daemon on localhostN/ANN/A
>>
>>                 Any suggestions would be much appreciated!
>>
>>                 Cheers
>>                 Andrew.
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130925/b2f741cd/attachment.html>