[Gluster-users] active/active failover

Mon Dec 11 20:52:59 UTC 2017

Dear all, 

I'm rather new to glusterfs but have some experience running lager lustre and beegfs installations. These filesystems provide active/active failover.  Now, I discovered that I can also do this in glusterfs, although I didn't find detailed documentation about it. (I'm using glusterfs 3.10.8)

So my question is: can I really use glusterfs to do failover in the way described below, or am I misusing glusterfs? (and potentially corrupting my data?)

My setup is: I have two servers (qlogin and gluster2) that access a shared SAN storage. Both servers connect to the same SAN (SAS multipath) and I implement locking via lvm2 and sanlock, so I can mount the same storage on either server. 
The idea is that normally each server serves one brick, but in case one server fails, the other server can serve both bricks. (I'm not interested on automatic failover, I'll always do this manually.  I could also use this to do maintainance on one server, with only minimal downtime.)

#normal setup:
[root at qlogin ~]# gluster volume info g2 
#...
# Volume Name: g2
# Type: Distribute
# Brick1: qlogin:/glust/castor/brick
# Brick2: gluster2:/glust/pollux/brick

#  failover: let's artificially fail one server by killing one glusterfsd:
[root at qlogin] systemctl status glusterd 
[root at qlogin] kill -9 <pid/of/glusterfsd/running/brick/castor>

# unmount brick
[root at qlogin] umount /glust/castor/ 

# deactive LV
[root at qlogin] lvchange  -a n vgosb06vd05/castor 

###  now do the failover:

# active same storage on other server:
[root at gluster2] lvchange  -a y vgosb06vd05/castor 

# mount on other server
[root at gluster2] mount /dev/mapper/vgosb06vd05-castor  /glust/castor 

# now move the "failed" brick to the other server
[root at gluster2] gluster volume replace-brick g2 qlogin:/glust/castor/brick gluster2:/glust/castor/brick commit force
### The last line is the one I have doubts about

#now I'm in failover state:
#Both bricks on one server:
[root at qlogin ~]# gluster volume info g2 
#...
# Volume Name: g2
# Type: Distribute
# Brick1: gluster2:/glust/castor/brick
# Brick2: gluster2:/glust/pollux/brick

Is it intended to work this way?

Thanks a lot!

best wishes,
Stefan