[Gluster-infra] Testing watchdog on some nodes

Michael Scherer mscherer at redhat.com
Sun Sep 13 17:02:38 UTC 2015


Le mardi 08 septembre 2015 à 14:56 +0200, Michael Scherer a écrit :
> Le mardi 08 septembre 2015 à 11:21 +0200, Michael Scherer a écrit :
> > Hi,
> > 
> > so since some nodes are stuck and likely need a reboot, I will test
> > using software watchdog on them to reboot when there is a issue as a
> > stop gap until we find what break them (but breaking is likely to happen
> > anyway, that's why we have tests)
> > 
> > I will take slave21 and 22 as guinea pig, so please do not reboot them
> > if they are stuck (or if you do, tell me ). If anything weird happen on
> > them (like reboot during a test, or this kind of stuff, please tell me
> > too.
> >
> > I guess 2 to 3 weeks of tests should be enough to see if we can push
> > that to other centos 6 slaves.
> 
> Seems we are missing a module on centos for the kernel support. So it
> might be less efficient.

And slave22 didn't reboot as planned, so watchdog do not seems to work.

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://www.gluster.org/pipermail/gluster-infra/attachments/20150913/350a49a3/attachment.sig>


More information about the Gluster-infra mailing list