[Gluster-users] Can't access volume during self-healing

Wed Oct 9 19:28:34 UTC 2013

On 10/09/2013 11:22 AM, Pruner, Anne (Anne) wrote:
>
> I'm evaluating gluster for use in our product, and I want to ensure 
> that I understand the failover behavior.  What I'm seeing isn't great, 
> but it doesn't look from the docs I've read that this is what everyone 
> else is experiencing.
>
> Is this normal?
>
> Setup:
>
> -one volume, distributed, replicated (2), with two bricks on two 
> different servers
>
> -35,000 files on volume, about 1MB each, all in one directory (I'm 
> open to changing this, if that's the problem.  ls --l takes a /really/ 
> long time)
>
> -volume is mounted (mount --t gluster) on server 1
>
> Procedure:
>
> -I stop glusterd and glusterfsd on server1, and send a few files to 
> the volume.  This is fine. I can write and read the files.
>
> -I start glusterd on server1, and this starts glusterfsd.  This 
> triggers self-heal.
>
> -Send a file to the server, and try to read it.
>
> -Sending takes a *couple of minutes*.  Reading is immediate.
>
> -Once self-heal is done, subsequent sends and reads are immediate.
>
> I tried profiling this operation, and it seems like it's stuck on 
> locking the file:
>
[Profiling deleted]
>
> Any ideas?
>
> Thanks,
>
> Anne
>
>
What I suspect is happening is those 35k files are all being checked for 
self-heal before the directory can be regarded as clean and ready to 
lock. An easy way to test this would be to try writing to a file in a 
nearly empty directory and see if you get the same results.

If you are using a current kernel, or a EL kernel with current 
backports, mounting with use-readdirp=on will make directory reads 
faster. Not sure how much faster with 35k files though. Would be 
interested in finding out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131009/0b2a09b6/attachment.html>