AFR recovery not working over infiniband (Re: [Gluster-devel] io recovering after failure)
Krishna Srinivas
krishna at zresearch.com
Thu Dec 6 20:05:21 UTC 2007
Can you check with the latest code if it works fine?
Thanks
Krishna
On Dec 1, 2007 11:14 PM, Mickey Mazarick <mic at digitaltadpole.com> wrote:
> Sorry to hound you about this but it turns out an afr volume failing
> works fine over tcp, but hangs the client over ib-verbs.
>
> Our ib-verbs driver is from the one included in OFED-1.2.5. Is this the
> recommended ib library? The error is raised at the transport level as
> you can see from the client log below. Let me know if you need any more
> detailed information.
>
> Thanks!
>
>
> Mickey Mazarick wrote:
> > AFR is being handled on the client... I simplified the specs down to
> > look exactly like the online example and I'm still seeing the same
> > result.
> > This is an infiniband setup so that may be the problem. We want to run
> > this on a 6 brick 100+ client cluster over infiniband.
> >
> > Whenever I kill the gluster daemon on RTPST201 it hangs and the client
> > log says:
> > /2007-11-30 07:55:14 E [unify.c:145:unify_buf_cbk] bricks: afrns
> > returned 107
> > 2007-11-30 07:55:14 E [unify.c:145:unify_buf_cbk] bricks: afrns
> > returned 107
> > 2007-11-30 07:55:34 E [ib-verbs.c:1100:ib_verbs_send_completion_proc]
> > transport/ib-verbs: send work request on `mthca0' returned error
> > wc.status = 12, wc.vendor_err = 129, post->buf = 0x2aaaad801000,
> > wc.byte_len = 0, post->reused = 210
> > 2007-11-30 07:55:34 E [ib-verbs.c:1100:ib_verbs_send_completion_proc]
> > transport/ib-verbs: send work request on `mthca0' returned error
> > wc.status = 12, wc.vendor_err = 129, post->buf = 0x2aaaac2bf000,
> > wc.byte_len = 0, post->reused = 168
> > 2007-11-30 07:55:34 E [ib-verbs.c:951:ib_verbs_recv_completion_proc]
> > transport/ib-verbs: ibv_get_cq_event failed, terminating recv thread
> > 2007-11-30 07:55:34 E [ib-verbs.c:1100:ib_verbs_send_completion_proc]
> > transport/ib-verbs: send work request on `mthca0' returned error
> > wc.status = 12, wc.vendor_err = 129, post->buf = 0x2aaaabfb9000,
> > wc.byte_len = 0, post->reused = 230/
> >
> >
> > Storage Bricks are:
> > RTPST201,RTPST202
> >
> > ########################Storage Brick vol spec:
> > volume afrmirror
> > type storage/posix
> > option directory /mnt/gluster/afrmirror
> > end-volume
> > volume afrns
> > type storage/posix
> > option directory /mnt/gluster/afrns
> > end-volume
> > volume afr
> > type storage/posix
> > option directory /mnt/gluster/afr
> > end-volume
> > volume server
> > type protocol/server
> > option transport-type ib-verbs/server # For ib-verbs transport
> > option ib-verbs-work-request-send-size 131072
> > option ib-verbs-work-request-send-count 64
> > option ib-verbs-work-request-recv-size 131072
> > option ib-verbs-work-request-recv-count 64
> > ##auth##
> > option auth.ip.afrmirror.allow *
> > option auth.ip.afrns.allow *
> > option auth.ip.afr.allow *
> > option auth.ip.main.allow *
> > option auth.ip.main-ns.allow *
> > end-volume
> >
> > #####################Client spec is:
> > volume afrvol1
> > type protocol/client
> > option transport-type ib-verbs/client option remote-host RTPST201
> > option remote-subvolume afr
> > end-volume
> >
> > volume afrmirror1
> > type protocol/client
> > option transport-type ib-verbs/client option remote-host RTPST201
> > option remote-subvolume afrmirror
> > end-volume
> >
> > volume afrvol2
> > type protocol/client
> > option transport-type ib-verbs/client option remote-host RTPST202
> > option remote-subvolume afr
> > end-volume
> >
> > volume afrmirror2
> > type protocol/client
> > option transport-type ib-verbs/client option remote-host RTPST202
> > option remote-subvolume afrmirror
> > end-volume
> >
> > volume afr1
> > type cluster/afr
> > subvolumes afrvol1 afrmirror2
> > end-volume
> >
> > volume afr2
> > type cluster/afr
> > subvolumes afrvol2 afrmirror1
> > end-volume
> >
> >
> > volume afrns1
> > type protocol/client
> > option transport-type ib-verbs/client
> > option remote-host RTPST201
> > option remote-subvolume afrns
> > end-volume
> > volume afrns2
> > type protocol/client
> > option transport-type ib-verbs/client
> > option remote-host RTPST202
> > option remote-subvolume afrns
> > end-volume
> >
> > volume afrns
> > type cluster/afr
> > subvolumes afrns1 afrns2
> > end-volume
> >
> > volume bricks
> > type cluster/unify
> > option namespace afrns
> > subvolumes afr1 afr2
> > option scheduler alu # use the ALU scheduler
> > option alu.order open-files-usage:disk-usage:read-usage:write-usage
> > end-volume
> >
> >
> > Krishna Srinivas wrote:
> >> If you have the AFR on the server side, and if this server goes down
> >> then
> >> all the FDs associated with the files on this server will return
> >> ENOTCONN
> >> error. (If that is how your setup is? ) But if you had AFR on the client
> >> side it would have worked seamlessly. However this situation will be
> >> handled when we bring out the HA translator
> >>
> >> Krishna
> >>
> >> On Nov 30, 2007 3:01 AM, Mickey Mazarick <mic at digitaltadpole.com> wrote:
> >>
> >>> Is this true for files that are currently open? For example I have a
> >>> virtual machine running that had a file open at all times. Errors are
> >>> bubbling back to the application layer instead of just waiting. After
> >>> that I have to unmount/remount the gluster vol. Is there a way of
> >>> preventing this?
> >>>
> >>> (This is the latest tla btw)
> >>> Thanks!
> >>>
> >>>
> >>> Anand Avati wrote:
> >>>
> >>>> This is possible already, just that the files from the node which are
> >>>> down will not be accessible for the time the server is down. When the
> >>>> server is brought back up, the files are made accessible again.
> >>>>
> >>>> avati
> >>>>
> >>>> 2007/11/30, Mickey Mazarick <mic at digitaltadpole.com
> >>>> <mailto:mic at digitaltadpole.com>>:
> >>>>
> >>>> Is there currently a way to force a client connection to retry
> >>>> dist io
> >>>> until a failed resource comes back online?
> >>>> if a disk in a unified volume drops I have to remount on all the
> >>>> clients. Is there a way around this?
> >>>>
> >>>> I'm using afr/unify on 6 storage bricks and I want to be able to
> >>>> change
> >>>> a server config setting and restart the server bricks one at a
> >>>> time
> >>>> without losing the mount point on the clients. Is this currently
> >>>> possible without doing ip failover?
> >>>> --
> >>>> _______________________________________________
> >>>> Gluster-devel mailing list
> >>>> Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
> >>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> It always takes longer than you expect, even when you take into
> >>>> account Hofstadter's Law.
> >>>>
> >>>> -- Hofstadter's Law
> >>>>
> >>> --
> >>>
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel at nongnu.org
> >>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>>
> >>>
> >
> >
>
>
> --
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list