[Gluster-devel] EAGAIN/EBUSY handling in glusterfs

Thu Jan 24 06:31:43 UTC 2013

The lvm snap scenario was taken as an example.

The question I raised was whether we need to handle these errors?

With regards,
Shishir

----- Original Message -----
From: "Anand Avati" <anand.avati at gmail.com>
To: "Shishir Gowda" <sgowda at redhat.com>
Cc: gluster-devel at nongnu.org
Sent: Thursday, January 24, 2013 12:18:42 AM
Subject: Re: [Gluster-devel] EAGAIN/EBUSY handling in glusterfs

On Wed, Jan 23, 2013 at 1:34 AM, Shishir Gowda < sgowda at redhat.com > wrote: 

Hi Avati, 

One of the possible scenarios is someone taking a lvm snap of the backend. 

Can you describe in more detail exact operations for which LVM snap returns EAGAIN or EINTR? EINTR in posix is best retried in posix level. However I'm not sure if LVM snapshote actually makes the disk filesystem return these non standard errors for any reason. Can you give an example strace of this happening? 

Avati 

few eg: 
DHT's rebalance: we would not retry a migration if case we got an error EAGAIN or even EINTR. 
Does self-heal retry healing if the error was EAGAIN or EINTR? 

These are just few I can think about. 

When snap feature becomes supported (refer to wiki link in previous page), few ops' would be blocked while snap is in progress. 

If we decide to provide complete snap in the future (not just crash-consistent), then in all probability all fops will be blocked. 

Do we guarantee all op's(triggered internally) that fail will be re-triggered? Or are we guaranteeing a state from which we can recover completely? 

With regards, 
Shishir 

----- Original Message ----- 
From: "Anand Avati" < anand.avati at gmail.com > 
To: "Shishir Gowda" < sgowda at redhat.com > 
Cc: gluster-devel at nongnu.org 
Sent: Wednesday, January 23, 2013 1:23:09 PM 
Subject: Re: [Gluster-devel] EAGAIN/EBUSY handling in glusterfs 

On Tue, Jan 22, 2013 at 10:39 PM, Shishir Gowda < sgowda at redhat.com > wrote: 

Hi All, 

Currently I see that almost all the xlators in glusterfs do not handle EAGAIN/EBUSY errors. 

Though this should be handled by the applications, 

If by "handle by application" you meant "handled by retrying syscall by application", that is not completely true. More generally it is true for EINTR, and some places for EAGAIN (i.e when used on non-blocking pollable file descriptors like sockets - which specifically does NOT include filesystem for regular read/write). EBUSY almost always does not suggest a poll/retry to the application. 

there are multiple paths where the op's are not performed by the applications (but are internal to glusterfs). 

Few of these are 
a. Rebalance 
b. Replace brick 
c. Self-heal 
d. lk's 
etc... 

With the proposed snap feature ( http://www.gluster.org/community/documentation/index.php/Features/snapshot ), would it not be better to identify such op's inside glusterfs? 

Can you explain more on that? Why is that necessary? 

Thanks, 
Avati 

Irrespective of the snap feature, I think it is about correctness to handle EAGAIN/EBUSY in these code paths. 

Please comment. 

With regards, 
Shishir 

_______________________________________________ 
Gluster-devel mailing list 
Gluster-devel at nongnu.org 
https://lists.nongnu.org/mailman/listinfo/gluster-devel