[Gluster-infra] Slave 25 and 24 are broken

Mon Mar 2 14:48:02 UTC 2015

On Mon, Mar 02, 2015 at 03:02:21PM +0100, Michael Scherer wrote:
> Le lundi 02 mars 2015 à 14:34 +0100, Michael Scherer a écrit :
> > Hi,
> > 
> > So, while trying to create a formulas to deploy jenkins for us, I found
> > out that slave25 and 24 have a rather annoying issue of not
> > having /etc/passwd. ( /etc/shadow is here ).
> > 
> > I am trying to save the 2 servers since salt kinda still work ( cannot
> > run command but i can modify file ), and will then look more what
> > happened.
> 
> So I saved the 2 servers ( kinda ). 
> 
> On slave 24, problem started between 06:50 and 08:50 ( UTC ); The log do
> not tell much, besides sshd user disappearing around 8h50 (UTC) this
> morning.
> 
> last message in /var/log/secure :
> Mar  2 08:46:25 slave24 sshd[21300]: Received disconnect from
> 195.154.77.78: 11: Bye Bye
> 
> No one connected since a few days.
> 
> On slave25, there is the same error but sooner :
> 
> Feb 27 11:35:38 slave25 kernel: device-mapper: thin: Data device (dm-1)
> discard unsupported: Disabling discard passdown.
> Feb 27 11:35:38 slave25 lvm[3817]: Monitoring thin patchy-pool-tpool.
> Feb 27 11:35:52 slave25 lvm[3817]: No longer monitoring thin
> patchy-pool-tpool.
> Feb 27 13:34:36 slave25 sshd[30041]: fatal: Privilege separation user
> sshd does not exist
> 
> 
> And like slave24, there is 2h between message on thin patchy-pool-tpool
> and the fact that sshd disappear.
> 
> 
> So I suspect one test is wrecking havoc.

This like looks rather dangerous:

   https://github.com/gluster/glusterfs/blob/master/tests/basic/gfid-access.t#L53

   TEST ! mv /etc/passwd $M0/.gfid

Granted, it should fail, and /etc/passwd should stay where it is... But
well?

Who wants to send a patch for that?

Niels