[Gluster-users] self-heal failed

Thu Jan 10 17:50:48 UTC 2013

Looks like replace-brick in a corrupted replicate brick with 3.3 doesn't
work any more. To rebuild the cluster is exactly what we want to avoid when
we choose to use glusterfs at the first place.

I assume to replace a failed replicate disk or node should be a standard
procedure, isn't it? I could find anything related to this in the 3.3
manual.

On Thu, Jan 10, 2013 at 12:18 PM, Daniel Taylor <dtaylor at vocalabs.com>wrote:

> I've run replace-brick on missing bricks before, it should still work.
>
> On the other hand, data corruption is the worst case failure mode.
> The one time I hit data corruption on a node my final answer ended up
> being to rebuild the cluster from scratch and restore the best copy of the
> data I had (mix of backups and live data).
>
>
> On 01/10/2013 11:12 AM, Liang Ma wrote:
>
>>
>> Thank you Daniel for you more comments.
>>
>> Now I can remove the damaged zfs brick after rebooting the system. But
>> then what can I do to rejoin a new brick? I can't run gluster volume
>> replace-brick because the old brick is gone. I can't even remove the old
>> brick because the gluster's replicate count is 2. So what is the right
>> procedure to replace a failed brick for replicate gluster volume?
>>
>> Liang
>>
>>
>> On Thu, Jan 10, 2013 at 11:57 AM, Daniel Taylor <dtaylor at vocalabs.com<mailto:
>> dtaylor at vocalabs.com>> wrote:
>>
>>     I'm not familiar with zfs in particular, but it should have given
>>     you a message saying why it won't unmount.
>>
>>     In the worst case you can indeed remove the mount point from
>>     /etc/fstab and reboot. A hard reboot may be necessary in a case
>>     like this.
>>
>>
>>     On 01/10/2013 10:43 AM, Liang Ma wrote:
>>
>>
>>         Yes, I stopped the glusterfs service on the damaged system but
>>         zfs still won't allow me to umount the filesystem. Maybe I
>>         should try to shutdown the entire system.
>>
>>
>>         On Wed, Jan 9, 2013 at 10:28 AM, Daniel Taylor
>>         <dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>
>>         <mailto:dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>>**>
>>
>>         wrote:
>>
>>
>>             On 01/09/2013 08:31 AM, Liang Ma wrote:
>>
>>
>>                 Hi Daniel,
>>
>>                 Ok, if gluster can't self-heal from this situation, I
>>         hope at
>>                 least I can manually restore the volume by using the good
>>                 brick available. So would you please tell me how can I
>>         "simply
>>                 rebuild the filesystem and let gluster attempt to
>>         restore it
>>                 from a *clean* filesystem"?
>>
>>
>>             Trimmed for space.
>>
>>             You could do as Tom Pfaff suggests, but given the odds of data
>>             corruption carrying forward I'd do the following:
>>             Shut down gluster on the damaged system.
>>             Unmount the damaged filesystem.
>>             Reformat the damaged filesystem as new (throwing away any
>>             potential corruption that might not get caught on rebuild)
>>             Mount the new filesystem at the original mount point
>>             Restart gluster
>>
>>             In the event of corruption due to hardware failure you'd
>>         be doing
>>             this on replacement hardware.
>>             The key is you have to have a functional filesystem for
>>         gluster to
>>             work with.
>>
>>
>>             --     Daniel Taylor             VP Operations Vocal
>>         Laboratories, Inc
>>         dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>
>>         <mailto:dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>>
>>         612-235-5711 <tel:612-235-5711>
>>             <tel:612-235-5711 <tel:612-235-5711>>
>>
>>
>>             ______________________________**_________________
>>             Gluster-users mailing list
>>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.**org<Gluster-users at gluster.org>
>> >
>>         <mailto:Gluster-users at gluster.**org <Gluster-users at gluster.org>
>>
>>         <mailto:Gluster-users at gluster.**org <Gluster-users at gluster.org>>>
>>         http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>>     --     Daniel Taylor             VP Operations       Vocal
>> Laboratories, Inc
>>     dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com> 612-235-5711
>>     <tel:612-235-5711>
>>
>>     ______________________________**_________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.**org<Gluster-users at gluster.org>
>> >
>>     http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
> --
> Daniel Taylor             VP Operations       Vocal Laboratories, Inc
> dtaylor at vocalabs.com                                     612-235-5711
>
> ______________________________**_________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130110/55793cbc/attachment.html>