[Gluster-devel] How does read-subvol-entry.t works?

Wed Mar 4 05:01:06 UTC 2015

On 03/04/2015 10:29 AM, Emmanuel Dreyfus wrote:
> Emmanuel Dreyfus <manu at netbsd.org> wrote:
>
>> It seems there is very weird stuff going on there: it fails because
>> in afr_inode_refresh_subvol_cbk (after a lookup), we have a valid
>> reply from brick 0 with op_ret = 0.
>>
>> But the brick 0 server process was killed. that makes no sense.
> Looking at a kernel trace I can now tell that the brick0 server process
> indeed gets a SIGKILL, but then glusterd spawn a new process for brick0
> that answers the requests.
>
> glusterd log confirms that: first it starts the two bricks;
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /d/backends/brick0 on port 49152
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /d/backends/brick1 on port 49153
>
> -> Killing brick0
> [glusterd-handler.c:4388:__glusterd_brick_rpc_notify] 0-management: Brick nbslave73.cloud.gluster.org:/d/backends/brick0 has disconnected from glusterd.
>
> -> And here it restarts!
>
> [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /d/backends/brick0 on port 49152
>
> -> test terminate and kill all bricks:
>
> [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /d/backends/brick0 on port 49152
> [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /d/backends/brick1 on port 49153
>
> Hence it ould be a glusterd bug? Why would it restart a brick on its own?
>
Not sure, CC'ing Atin who might be able to shed some light on the 
glusterd logs. If the brick gets restarted as you say, the brick log 
will also contain something like "I [glusterfsd.c:1959:main] 
0-/usr/local/sbin/glusterfsd: Started running 
/usr/local/sbin/glusterfsd"  and the graph information etc. Does it? And 
does volume status show the brick as online again?

-Ravi