[Gluster-users] 3.7.9: gluster volume heal <ds> info blocking I/O

Sat Mar 26 11:23:05 UTC 2016

This is further to my earlier posts on "Very poor heal behaviour in 
3.7.9", same test environment.

After testing the heal process by killing glusterfsd on a node I noticed 
the following.

- I/O continued at normal speed while glusterfsd was down.

- After restarting glusterfsd, I/O still continued as normal

- performing a "gluster volume heal datastore2 info" whould show some 
info then hang.

- I/O on the cluster would cease. e.g in a VM where I was running a 
command line build of a large project, the build just stopped. The VM 
itself was mostly responsive but anything that involved accessing the 
disk hung.

- if I killed the "gluster volume heal datastore2 info" command then I/O 
in the VM's resumed at a normal pace.

- if I then reissued the "gluster volume heal datastore2 info" command 
I/O would continue for a short while (seconds - minutes) before hanging 
again.

- killing the heal info command would resume I/O again.

This looks like some sort of deadlock bug. The heal info command was 
optimisied for 3.7.8/3.7.9 wasn't it?

thanks,

-- 
Lindsay Mathieson