[Gluster-users] hung disk sleep process

Wed Nov 15 20:54:43 UTC 2017

Hi,

I have a problem, but am not really sure the question I need to ask.  So 
going to lay it all down and maybe someone can point me in the right 
direction...

I have a replicated gluster volume across two servers.  Each server has 
its OS installed on an SSD, and a RAID array is mounted on each server 
as a brick.  Both servers run a Samba AD, among other things, and an LXC 
container for a member server.  Each container's samba installation is 
configured to use CTDB, so both containers act as a clustered file 
system.  Both LXC containers at startup mount the local directory where 
the glusterfs is mounted on their respective host servers using 
lxc.mount.entry in their config files.

This works, mostly.  but after some hours, days or weeks, a problem will 
develop.  The initial symptom were reports by end users of either not 
being able to access a folder or file.  That led me to an error in the 
CTDB logs about locking a file that is in the container on the OS disk, 
so I thought it was a CTDB problem, but when I chased that with the help 
of Martin from the samba mailing list, I find under that an smbd process 
in the LXC container which has entered a disk sleep state.  This process 
causes the same thing to happen on the host.  I am unable to get any 
trace on the smbd process in the container or on the host, but under 
/proc/<pid>/stack I get something like the following each time:

[<ffffffffc047e856>] request_wait_answer+0x166/0x1f0 [fuse]
[<ffffffff8b0b8d50>] prepare_to_wait_event+0xf0/0xf0
[<ffffffffc047e958>] __fuse_request_send+0x78/0x80 [fuse]
[<ffffffffc0481bdd>] fuse_simple_request+0xbd/0x190 [fuse]
[<ffffffffc0487c37>] fuse_setlk+0x177/0x190 [fuse]
[<ffffffff8b259467>] SyS_flock+0x117/0x190
[<ffffffff8b003b1c>] do_syscall_64+0x7c/0xf0
[<ffffffff8b60632f>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

Once ps shows the D state, the only fix I have found is to reboot the 
server.  After a reboot, things are fine again, until they are not.

Given that gluster is the only thing I know of that uses fuse in this 
system, I guess this is the next thing to investigate.

fwiw, in investigating this, I found that the glusterfs could also be 
mounted in the LXC container by network, instead of mounting the local 
file system.  I set that up, and the mount works, but group permissions 
do not work when accessing the files through the samba share.  if I 
change the file owner to be that of the user I am accessing it with, it 
works, or if I chmod 007 the file that I am trying to access it works, 
but if I chmod 070 the file and access it as a user in the group owner's 
group, I will get an access denied error in the logs.  So I put it back 
to a local mount.  Not sure if this is a different problem or related.

Also not certain if this has any relevance, but this seems to happen 
consistently with certain files.  There is one directory that contains 
an xlsx file, and this file frequently shows up as the one causing the 
disk sleep process as the last file accessed by the hung process's owner 
at the time symptoms start.  Other files also show up repeatedly, though 
less frequently.  I have tried resaving these files, deleting the 
original, and replacing them, but after some time they show up again.

Also uncertain as to relevance; when I run a heal info, there are two 
GFIDs listed.  Neither of them ever seems to go away, and they are both 
listed on each node.  The GFIDs listed in the .glusterfs directory are 
not owned by same owner/group as the files that frequently show up as 
above, and so far I have not been able to find the actual path to these 
files.  I am currently searching the data storage for the related inode 
number, but it has been running for several hours and not found it yet.

I have been trawling the glusterfs logs.  lots of cryptic stuff in there 
I don't understand yet, but I have found nothing consistent with the 
times of errors in the CTDB logs that would indicate a problem.

Another problem to note, for the sake of completeness, is that I also 
have dovecot's mail store on this replicated volume.  we have noted that 
email clients will fail to see mail unless a ls is run on the directory 
from the cli.  I was informed on the IRC channel (which I cannot 
currently access due to "user limit reached") that this is because of a 
known bug that will be fixed in a later release.  I have setup a while 
loop to run this ls command, which seems to be dealing with that. 
However, I am not convinced this problem is related to my hung processes.

so, I am not sure how to chase this to the next step.  I am not sure if 
this is a gluster issue, or an LXC issue.  it seems to not be a CTDB 
issue, but it has not been definitively ruled out in my mind.  would 
anyone have any suggestions on how I might determine if this is a 
gluster issue, or perhaps point me at the right set of documentation for 
troubleshooting this kind of issue?

-- 
Bob Miller
Cell: 867-334-7117
Office: 867-633-3760
www.computerisms.ca