[Gluster-users] self-heal failed
Daniel Taylor
dtaylor at vocalabs.com
Thu Jan 10 17:18:51 UTC 2013
I've run replace-brick on missing bricks before, it should still work.
On the other hand, data corruption is the worst case failure mode.
The one time I hit data corruption on a node my final answer ended up
being to rebuild the cluster from scratch and restore the best copy of
the data I had (mix of backups and live data).
On 01/10/2013 11:12 AM, Liang Ma wrote:
>
> Thank you Daniel for you more comments.
>
> Now I can remove the damaged zfs brick after rebooting the system. But
> then what can I do to rejoin a new brick? I can't run gluster volume
> replace-brick because the old brick is gone. I can't even remove the
> old brick because the gluster's replicate count is 2. So what is the
> right procedure to replace a failed brick for replicate gluster volume?
>
> Liang
>
>
> On Thu, Jan 10, 2013 at 11:57 AM, Daniel Taylor <dtaylor at vocalabs.com
> <mailto:dtaylor at vocalabs.com>> wrote:
>
> I'm not familiar with zfs in particular, but it should have given
> you a message saying why it won't unmount.
>
> In the worst case you can indeed remove the mount point from
> /etc/fstab and reboot. A hard reboot may be necessary in a case
> like this.
>
>
> On 01/10/2013 10:43 AM, Liang Ma wrote:
>
>
> Yes, I stopped the glusterfs service on the damaged system but
> zfs still won't allow me to umount the filesystem. Maybe I
> should try to shutdown the entire system.
>
>
> On Wed, Jan 9, 2013 at 10:28 AM, Daniel Taylor
> <dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>
> <mailto:dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>>>
> wrote:
>
>
> On 01/09/2013 08:31 AM, Liang Ma wrote:
>
>
> Hi Daniel,
>
> Ok, if gluster can't self-heal from this situation, I
> hope at
> least I can manually restore the volume by using the good
> brick available. So would you please tell me how can I
> "simply
> rebuild the filesystem and let gluster attempt to
> restore it
> from a *clean* filesystem"?
>
>
> Trimmed for space.
>
> You could do as Tom Pfaff suggests, but given the odds of data
> corruption carrying forward I'd do the following:
> Shut down gluster on the damaged system.
> Unmount the damaged filesystem.
> Reformat the damaged filesystem as new (throwing away any
> potential corruption that might not get caught on rebuild)
> Mount the new filesystem at the original mount point
> Restart gluster
>
> In the event of corruption due to hardware failure you'd
> be doing
> this on replacement hardware.
> The key is you have to have a functional filesystem for
> gluster to
> work with.
>
>
> -- Daniel Taylor VP Operations Vocal
> Laboratories, Inc
> dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>
> <mailto:dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>>
> 612-235-5711 <tel:612-235-5711>
> <tel:612-235-5711 <tel:612-235-5711>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>>
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Daniel Taylor VP Operations Vocal Laboratories, Inc
> dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com> 612-235-5711
> <tel:612-235-5711>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
--
Daniel Taylor VP Operations Vocal Laboratories, Inc
dtaylor at vocalabs.com 612-235-5711
More information about the Gluster-users
mailing list