[Gluster-users] active/active failover

Alex Chekholko alex at calicolabs.com
Mon Dec 11 22:07:41 UTC 2017


Hi Stefan,

I think what you propose will work, though you should test it thoroughly.

I think more generally, "the GlusterFS way" would be to use 2-way
replication instead of a distributed volume; then you can lose one of your
servers without outage.  And re-synchronize when it comes back up.

Chances are if you weren't using the SAN volumes; you could have purchased
two servers each with enough disk to make two copies of the data, all for
less dollars...

Regards,
Alex


On Mon, Dec 11, 2017 at 12:52 PM, Stefan Solbrig <stefan.solbrig at ur.de>
wrote:

> Dear all,
>
> I'm rather new to glusterfs but have some experience running lager lustre
> and beegfs installations. These filesystems provide active/active
> failover.  Now, I discovered that I can also do this in glusterfs, although
> I didn't find detailed documentation about it. (I'm using glusterfs 3.10.8)
>
> So my question is: can I really use glusterfs to do failover in the way
> described below, or am I misusing glusterfs? (and potentially corrupting my
> data?)
>
> My setup is: I have two servers (qlogin and gluster2) that access a shared
> SAN storage. Both servers connect to the same SAN (SAS multipath) and I
> implement locking via lvm2 and sanlock, so I can mount the same storage on
> either server.
> The idea is that normally each server serves one brick, but in case one
> server fails, the other server can serve both bricks. (I'm not interested
> on automatic failover, I'll always do this manually.  I could also use this
> to do maintainance on one server, with only minimal downtime.)
>
>
> #normal setup:
> [root at qlogin ~]# gluster volume info g2
> #...
> # Volume Name: g2
> # Type: Distribute
> # Brick1: qlogin:/glust/castor/brick
> # Brick2: gluster2:/glust/pollux/brick
>
> #  failover: let's artificially fail one server by killing one glusterfsd:
> [root at qlogin] systemctl status glusterd
> [root at qlogin] kill -9 <pid/of/glusterfsd/running/brick/castor>
>
> # unmount brick
> [root at qlogin] umount /glust/castor/
>
> # deactive LV
> [root at qlogin] lvchange  -a n vgosb06vd05/castor
>
>
> ###  now do the failover:
>
> # active same storage on other server:
> [root at gluster2] lvchange  -a y vgosb06vd05/castor
>
> # mount on other server
> [root at gluster2] mount /dev/mapper/vgosb06vd05-castor  /glust/castor
>
> # now move the "failed" brick to the other server
> [root at gluster2] gluster volume replace-brick g2
> qlogin:/glust/castor/brick gluster2:/glust/castor/brick commit force
> ### The last line is the one I have doubts about
>
> #now I'm in failover state:
> #Both bricks on one server:
> [root at qlogin ~]# gluster volume info g2
> #...
> # Volume Name: g2
> # Type: Distribute
> # Brick1: gluster2:/glust/castor/brick
> # Brick2: gluster2:/glust/pollux/brick
>
>
> Is it intended to work this way?
>
> Thanks a lot!
>
> best wishes,
> Stefan
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171211/eb8d3cf1/attachment.html>


More information about the Gluster-users mailing list