[Gluster-devel] HA translator questions

Krishna Srinivas krishna at zresearch.com
Thu Jan 1 18:24:35 UTC 2009


On Thu, Jan 1, 2009 at 11:44 PM, Martin Fick <mogulguy at yahoo.com> wrote:
> --- On Thu, 1/1/09, Krishna Srinivas <krishna at zresearch.com> wrote:
>>
>> <mogulguy at yahoo.com> wrote:
>> > I am a bit curious about the new HA translator and how
>> > it it supposed to work?  I have looked through the code a
>> > bit and this is my naive interpretation of the way it is
>> > designed:
>> >
>> > It appears that the HA translator keeps track of its
>> > subvolumes and whether they are active or not.  When
>> > attempting to dispatch a request, it picks a currently
>> > active subvolume or fails if none are currently active.  It
>> > appears that once it has chosen an active subvolume for a
>> > request, it can no longer fail over to another subvolume for
>> > that particular request, is this true?  If so, then it is
>> > possible (perhaps likely) that during failovers some
>> > requests will fail before failover happens even if certain
>> > subvolumes never go down?  :(   Is this correct or am I
>> > missing something?
>>
>> No. Requests are retried on the next subvolume if the
>> current one goes
>> down during the operation, so it should work fine.
>
> Hmm, I don't see this looping on failure in the code, but my understanding of the translator design is fairly minimal.  I will have to look harder.  I was hoping to be able to modify the subvolume looping to be able to loop back upon itself indefinitely if all the subvolumes failed.  If this could be done, it seems like this would be an easy way to achieve NFS style blocking when the server is down (see my other thread on this), by simply using the HA translator with only one subvolume.

Just curious, why do you want the application to hang till the server
comes back up? the indefinite hang is not desirable to most users. In
case of NFS if the NFS server is down, won't the client error out
saying that server is down?

>
> Also, how about failure due to replies that do not return because the link is down?  Are the requests saved after they are sent until the reply arrives so that it can be resent on the other link if the original link successfully sends the request, but goes down afterwards and cannot receive the reply?

Yes requests are saved so that it can be retried on other subvol if
the current subvol goes down during operaion.

>
> Thanks and Happy New Year,

Thanks! Happy New Year to you too and the rest of the user community -
from the dev team!

Krishna

>
> -Martin
>
>
>
>
>





More information about the Gluster-devel mailing list