AFR recovery not working over infiniband (Re: [Gluster-devel] io recovering after failure)

Sat Dec 1 17:44:31 UTC 2007

Sorry to hound you about this but it turns out an afr volume failing 
works fine over tcp, but hangs the client over ib-verbs.

Our ib-verbs driver is from the one included in OFED-1.2.5. Is this the 
recommended ib library? The error is raised at the transport level as 
you can see from the client log below. Let me know if you need any more 
detailed information.

Thanks!

Mickey Mazarick wrote:
> AFR is being handled on the client... I simplified the specs down to 
> look exactly like the online example and I'm still seeing the same 
> result.
> This is an infiniband setup so that may be the problem. We want to run 
> this on a 6 brick 100+ client cluster over infiniband.
>
> Whenever I kill the gluster daemon on RTPST201 it hangs and the client 
> log says:
> /2007-11-30 07:55:14 E [unify.c:145:unify_buf_cbk] bricks: afrns 
> returned 107
> 2007-11-30 07:55:14 E [unify.c:145:unify_buf_cbk] bricks: afrns 
> returned 107
> 2007-11-30 07:55:34 E [ib-verbs.c:1100:ib_verbs_send_completion_proc] 
> transport/ib-verbs: send work request on `mthca0' returned error 
> wc.status = 12, wc.vendor_err = 129, post->buf = 0x2aaaad801000, 
> wc.byte_len = 0, post->reused = 210
> 2007-11-30 07:55:34 E [ib-verbs.c:1100:ib_verbs_send_completion_proc] 
> transport/ib-verbs: send work request on `mthca0' returned error 
> wc.status = 12, wc.vendor_err = 129, post->buf = 0x2aaaac2bf000, 
> wc.byte_len = 0, post->reused = 168
> 2007-11-30 07:55:34 E [ib-verbs.c:951:ib_verbs_recv_completion_proc] 
> transport/ib-verbs: ibv_get_cq_event failed, terminating recv thread
> 2007-11-30 07:55:34 E [ib-verbs.c:1100:ib_verbs_send_completion_proc] 
> transport/ib-verbs: send work request on `mthca0' returned error 
> wc.status = 12, wc.vendor_err = 129, post->buf = 0x2aaaabfb9000, 
> wc.byte_len = 0, post->reused = 230/
>
>
> Storage Bricks are:
> RTPST201,RTPST202
>
> ########################Storage Brick vol spec:
> volume afrmirror
>  type storage/posix
>  option directory /mnt/gluster/afrmirror
> end-volume
> volume afrns
>  type storage/posix
>  option directory /mnt/gluster/afrns
> end-volume
> volume afr
>  type storage/posix
>  option directory /mnt/gluster/afr
> end-volume
> volume server
> type protocol/server
> option transport-type ib-verbs/server # For ib-verbs transport
> option ib-verbs-work-request-send-size  131072
> option ib-verbs-work-request-send-count 64
> option ib-verbs-work-request-recv-size  131072
> option ib-verbs-work-request-recv-count 64
>  ##auth##
>  option auth.ip.afrmirror.allow *
> option auth.ip.afrns.allow *
> option auth.ip.afr.allow *
> option auth.ip.main.allow *
> option auth.ip.main-ns.allow *
> end-volume
>
> #####################Client spec is:
> volume afrvol1
>  type protocol/client
>  option transport-type ib-verbs/client   option remote-host RTPST201
>  option remote-subvolume afr
> end-volume
>
> volume afrmirror1
>  type protocol/client
>  option transport-type ib-verbs/client   option remote-host RTPST201
>  option remote-subvolume afrmirror
> end-volume
>
> volume afrvol2
>  type protocol/client
>  option transport-type ib-verbs/client   option remote-host RTPST202
>  option remote-subvolume afr
> end-volume
>
> volume afrmirror2
>  type protocol/client
>  option transport-type ib-verbs/client   option remote-host RTPST202
>  option remote-subvolume afrmirror
> end-volume
>
> volume afr1
>  type cluster/afr
>  subvolumes afrvol1 afrmirror2
> end-volume
>
> volume afr2
>  type cluster/afr
>  subvolumes afrvol2 afrmirror1
> end-volume
>
>
> volume afrns1
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST201
>  option remote-subvolume afrns
> end-volume
> volume afrns2
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST202
>  option remote-subvolume afrns
> end-volume
>
> volume afrns
>  type cluster/afr
>  subvolumes afrns1 afrns2
> end-volume
>
> volume bricks
>  type cluster/unify
>  option namespace afrns
>  subvolumes afr1 afr2
>  option scheduler alu   # use the ALU scheduler
>  option alu.order open-files-usage:disk-usage:read-usage:write-usage
> end-volume
>
>
> Krishna Srinivas wrote:
>> If you have the AFR on the server side, and if this server goes down 
>> then
>> all the FDs associated with the files on this server will return 
>> ENOTCONN
>> error. (If that is how your setup is? ) But if you had AFR on the client
>> side it would have worked seamlessly. However this situation will be
>> handled when we bring out the HA translator
>>
>> Krishna
>>
>> On Nov 30, 2007 3:01 AM, Mickey Mazarick <mic at digitaltadpole.com> wrote:
>>  
>>> Is this true for files that are currently open? For example I have a
>>> virtual machine running that had a file open at all times. Errors are
>>> bubbling back to the application layer instead of just waiting. After
>>> that I have to unmount/remount the gluster vol. Is there a way of
>>> preventing this?
>>>
>>> (This is the latest tla btw)
>>> Thanks!
>>>
>>>
>>> Anand Avati wrote:
>>>    
>>>> This is possible already, just that the files from the node which are
>>>> down will not be accessible for the time the server is down. When the
>>>> server is brought back up, the files are made accessible again.
>>>>
>>>> avati
>>>>
>>>> 2007/11/30, Mickey Mazarick <mic at digitaltadpole.com
>>>> <mailto:mic at digitaltadpole.com>>:
>>>>
>>>>     Is there currently a way to force a client connection to retry 
>>>> dist io
>>>>     until a failed resource comes back online?
>>>>     if a disk in a unified volume drops I have to remount on all the
>>>>     clients. Is there a way around this?
>>>>
>>>>     I'm using afr/unify on 6 storage bricks and I want to be able to
>>>>     change
>>>>     a server config setting and restart the server bricks one at a 
>>>> time
>>>>     without losing the mount point on the clients. Is this currently
>>>>     possible without doing ip failover?
>>>>     --
>>>>     _______________________________________________
>>>>     Gluster-devel mailing list
>>>>     Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>>>>           http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> It always takes longer than you expect, even when you take into
>>>> account Hofstadter's Law.
>>>>
>>>> -- Hofstadter's Law
>>>>       
>>> -- 
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at nongnu.org
>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
>>>     
>
>

--