[Gluster-devel] How does read-subvol-entry.t works?

Wed Mar 4 04:59:44 UTC 2015

Emmanuel Dreyfus <manu at netbsd.org> wrote:

> It seems there is very weird stuff going on there: it fails because 
> in afr_inode_refresh_subvol_cbk (after a lookup), we have a valid 
> reply from brick 0 with op_ret = 0.
> 
> But the brick 0 server process was killed. that makes no sense.

Looking at a kernel trace I can now tell that the brick0 server process 
indeed gets a SIGKILL, but then glusterd spawn a new process for brick0 
that answers the requests. 

glusterd log confirms that: first it starts the two bricks;
[glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /d/backends/brick0 on port 49152
[glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /d/backends/brick1 on port 49153

-> Killing brick0
[glusterd-handler.c:4388:__glusterd_brick_rpc_notify] 0-management: Brick nbslave73.cloud.gluster.org:/d/backends/brick0 has disconnected from glusterd.

-> And here it restarts!

[glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /d/backends/brick0 on port 49152

-> test terminate and kill all bricks:

[glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /d/backends/brick0 on port 49152
[glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /d/backends/brick1 on port 49153

Hence it ould be a glusterd bug? Why would it restart a brick on its own?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org