[Gluster-users] AFR Version used for self-heal
joe at julianfamily.org
Fri Feb 26 06:01:05 UTC 2016
On February 25, 2016 8:32:44 PM PST, Kyle Maas <kyle at virtualinterconnect.com> wrote:
>On 02/25/2016 08:20 PM, Ravishankar N wrote:
>> On 02/25/2016 11:36 PM, Kyle Maas wrote:
>>> How can I tell what AFR version a cluster is using for self-heal?
>> If all your servers and clients are 3.7.8, then they are by default
>> running afr-v2. Afr-v2 was a re-write of afr that went in for 3.6.,
>> so any gluster package from then on has this code, you don't need to
>> explicitly enable anything.
>That was what I thought until I ran across this IRC log where JoeJulian
>asked if it was explicitly enabled:
A couple lines down, though, i continued "Ah, I was confusing that with nsr."
>>> The reason I ask is that I have a two-node replicated 3.7.8 cluster
>>> arbiters) which has locking behavior during self-heal which looks
>>> similar to that of AFRv1 (only heals one file at a time per
>>> daemon, appears to lock the full inode while it's healing it instead
>>> just ranges, etc.),
>> Both v1 and v2 use range locks while healing a given file, so
>> shouldn't block when heals happen. What is the problem you're facing?
>> Are your clients also at 3.7.8?
>Primary symptoms are:
>1. While a self-heal is running, only one file at a time is healed per
>brick. As I understand it, AFRv2 and up should allow for multiple
>to be healed concurrently or at least multiple ranges within a file,
>particularly with io-thread-count set to >1. During a self-heal,
>neither I/O nor network is saturated, which leads me to believe that
>looking at a single synchronous self-healing process.
>3. More troubling is that during a self-heal, clients cannot so much as
>list the files on the volume until the self-heal is done. No errors.
>No timeouts. They just freeze. As soon as the self-heal is complete,
>they unfreeze and list the contents.
>4. Any file access during a self-heal also freezes, just like a
>directory listing, until the self-heal is done. This wreaks havoc on
>users who have files open when one of the bricks is rebooted and has to
>be healed, since with as much data is stored on this cluster, a
>self-heal can take almost 24 hours.
>I experience the same problems when I run without any clients other
>the bricks themselves mounting the volume, so yes, it happens with the
>clients on 3.7.8 as well.
>Gluster-users mailing list
>Gluster-users at gluster.org
Sent from my Android device with K-9 Mail. Please excuse my brevity.
More information about the Gluster-users