AFR recovery not working over infiniband (Re: [Gluster-devel] io recovering after failure)

Mon Dec 10 05:02:48 UTC 2007

Sorry for the delay in testing, we had to set up a test environment over 
infiniband so it would not impact our production system.
Thanks for the fix, it works great!

AFR looks great!

Is there a way to see if an afr pair has resynchronized after a failure? 
(other than looking at all the files I mean)  On our setup the 
filesystem becomes unreadable if the namespace bricks are restarted too 
close to each other.
We were looking at writing a scripts that restarts each brick after an 
upgrade so we can update to newer versions without having to remount the 
clients. 

Thanks again for the quick response!

Krishna Srinivas wrote:
> Can you check with the latest code if it works fine?
>
> Thanks
> Krishna
>
> On Dec 1, 2007 11:14 PM, Mickey Mazarick <mic at digitaltadpole.com> wrote:
>   
>> Sorry to hound you about this but it turns out an afr volume failing
>> works fine over tcp, but hangs the client over ib-verbs.
>>
>> Our ib-verbs driver is from the one included in OFED-1.2.5. Is this the
>> recommended ib library? The error is raised at the transport level as
>> you can see from the client log below. Let me know if you need any more
>> detailed information.
>>
>> Thanks!
>>
>>
>> Mickey Mazarick wrote:
>>     
>>> AFR is being handled on the client... I simplified the specs down to
>>> look exactly like the online example and I'm still seeing the same
>>> result.
>>> This is an infiniband setup so that may be the problem. We want to run
>>> this on a 6 brick 100+ client cluster over infiniband.
>>>
>>> Whenever I kill the gluster daemon on RTPST201 it hangs and the client
>>> log says:
>>> /2007-11-30 07:55:14 E [unify.c:145:unify_buf_cbk] bricks: afrns
>>> returned 107
>>> 2007-11-30 07:55:14 E [unify.c:145:unify_buf_cbk] bricks: afrns
>>> returned 107
>>> 2007-11-30 07:55:34 E [ib-verbs.c:1100:ib_verbs_send_completion_proc]
>>> transport/ib-verbs: send work request on `mthca0' returned error
>>> wc.status = 12, wc.vendor_err = 129, post->buf = 0x2aaaad801000,
>>> wc.byte_len = 0, post->reused = 210
>>> 2007-11-30 07:55:34 E [ib-verbs.c:1100:ib_verbs_send_completion_proc]
>>> transport/ib-verbs: send work request on `mthca0' returned error
>>> wc.status = 12, wc.vendor_err = 129, post->buf = 0x2aaaac2bf000,
>>> wc.byte_len = 0, post->reused = 168
>>> 2007-11-30 07:55:34 E [ib-verbs.c:951:ib_verbs_recv_completion_proc]
>>> transport/ib-verbs: ibv_get_cq_event failed, terminating recv thread
>>> 2007-11-30 07:55:34 E [ib-verbs.c:1100:ib_verbs_send_completion_proc]
>>> transport/ib-verbs: send work request on `mthca0' returned error
>>> wc.status = 12, wc.vendor_err = 129, post->buf = 0x2aaaabfb9000,
>>> wc.byte_len = 0, post->reused = 230/
>>>
>>>
>>> Storage Bricks are:
>>> RTPST201,RTPST202
>>>
>>> ########################Storage Brick vol spec:
>>> volume afrmirror
>>>  type storage/posix
>>>  option directory /mnt/gluster/afrmirror
>>> end-volume
>>> volume afrns
>>>  type storage/posix
>>>  option directory /mnt/gluster/afrns
>>> end-volume
>>> volume afr
>>>  type storage/posix
>>>  option directory /mnt/gluster/afr
>>> end-volume
>>> volume server
>>> type protocol/server
>>> option transport-type ib-verbs/server # For ib-verbs transport
>>> option ib-verbs-work-request-send-size  131072
>>> option ib-verbs-work-request-send-count 64
>>> option ib-verbs-work-request-recv-size  131072
>>> option ib-verbs-work-request-recv-count 64
>>>  ##auth##
>>>  option auth.ip.afrmirror.allow *
>>> option auth.ip.afrns.allow *
>>> option auth.ip.afr.allow *
>>> option auth.ip.main.allow *
>>> option auth.ip.main-ns.allow *
>>> end-volume
>>>
>>> #####################Client spec is:
>>> volume afrvol1
>>>  type protocol/client
>>>  option transport-type ib-verbs/client   option remote-host RTPST201
>>>  option remote-subvolume afr
>>> end-volume
>>>
>>> volume afrmirror1
>>>  type protocol/client
>>>  option transport-type ib-verbs/client   option remote-host RTPST201
>>>  option remote-subvolume afrmirror
>>> end-volume
>>>
>>> volume afrvol2
>>>  type protocol/client
>>>  option transport-type ib-verbs/client   option remote-host RTPST202
>>>  option remote-subvolume afr
>>> end-volume
>>>
>>> volume afrmirror2
>>>  type protocol/client
>>>  option transport-type ib-verbs/client   option remote-host RTPST202
>>>  option remote-subvolume afrmirror
>>> end-volume
>>>
>>> volume afr1
>>>  type cluster/afr
>>>  subvolumes afrvol1 afrmirror2
>>> end-volume
>>>
>>> volume afr2
>>>  type cluster/afr
>>>  subvolumes afrvol2 afrmirror1
>>> end-volume
>>>
>>>
>>> volume afrns1
>>>  type protocol/client
>>>  option transport-type ib-verbs/client
>>>  option remote-host RTPST201
>>>  option remote-subvolume afrns
>>> end-volume
>>> volume afrns2
>>>  type protocol/client
>>>  option transport-type ib-verbs/client
>>>  option remote-host RTPST202
>>>  option remote-subvolume afrns
>>> end-volume
>>>
>>> volume afrns
>>>  type cluster/afr
>>>  subvolumes afrns1 afrns2
>>> end-volume
>>>
>>> volume bricks
>>>  type cluster/unify
>>>  option namespace afrns
>>>  subvolumes afr1 afr2
>>>  option scheduler alu   # use the ALU scheduler
>>>  option alu.order open-files-usage:disk-usage:read-usage:write-usage
>>> end-volume
>>>
>>>
>>> Krishna Srinivas wrote:
>>>       
>>>> If you have the AFR on the server side, and if this server goes down
>>>> then
>>>> all the FDs associated with the files on this server will return
>>>> ENOTCONN
>>>> error. (If that is how your setup is? ) But if you had AFR on the client
>>>> side it would have worked seamlessly. However this situation will be
>>>> handled when we bring out the HA translator
>>>>
>>>> Krishna
>>>>
>>>> On Nov 30, 2007 3:01 AM, Mickey Mazarick <mic at digitaltadpole.com> wrote:
>>>>
>>>>         
>>>>> Is this true for files that are currently open? For example I have a
>>>>> virtual machine running that had a file open at all times. Errors are
>>>>> bubbling back to the application layer instead of just waiting. After
>>>>> that I have to unmount/remount the gluster vol. Is there a way of
>>>>> preventing this?
>>>>>
>>>>> (This is the latest tla btw)
>>>>> Thanks!
>>>>>
>>>>>
>>>>> Anand Avati wrote:
>>>>>
>>>>>           
>>>>>> This is possible already, just that the files from the node which are
>>>>>> down will not be accessible for the time the server is down. When the
>>>>>> server is brought back up, the files are made accessible again.
>>>>>>
>>>>>> avati
>>>>>>
>>>>>> 2007/11/30, Mickey Mazarick <mic at digitaltadpole.com
>>>>>> <mailto:mic at digitaltadpole.com>>:
>>>>>>
>>>>>>     Is there currently a way to force a client connection to retry
>>>>>> dist io
>>>>>>     until a failed resource comes back online?
>>>>>>     if a disk in a unified volume drops I have to remount on all the
>>>>>>     clients. Is there a way around this?
>>>>>>
>>>>>>     I'm using afr/unify on 6 storage bricks and I want to be able to
>>>>>>     change
>>>>>>     a server config setting and restart the server bricks one at a
>>>>>> time
>>>>>>     without losing the mount point on the clients. Is this currently
>>>>>>     possible without doing ip failover?
>>>>>>     --
>>>>>>     _______________________________________________
>>>>>>     Gluster-devel mailing list
>>>>>>     Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>>>>>>           http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> It always takes longer than you expect, even when you take into
>>>>>> account Hofstadter's Law.
>>>>>>
>>>>>> -- Hofstadter's Law
>>>>>>
>>>>>>             
>>>>> --
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at nongnu.org
>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>
>>>>>
>>>>>           
>>>       
>> --
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>     

--