[Gluster-users] Replica Out Of Sync - How to efficiently recover

Bongs onlybongs at gmail.com
Sat Apr 29 18:45:56 UTC 2017


Good morning guys,

We’re using GlusterFS 3.6.7 on RHEL 6.7 on AWS using multiple 1TB EBS GP2
disks as bricks.
We have two nodes with several volumes using type Replicate and two bricks.
1 brick belong to server #1 and, of course, the other one to server #2.
Transport is over TCP and the only option reconfigured is
performance.cache-size which is tuned to 4GB.
Clients connect to those targets over FUSE with backupvolfile-server
parameter configured to server #2 and primary to server #1

Is worth to specify that those bricks host hundreds of thousands of
subdirectories which contains a lot of small xml files and images.

Couple of weeks ago one of the nodes goes down because of some AWS problem,
reboot was so quick that we don't even record this with agents so, because
daemon was not enable on autostart, brick two went out of sync of about 1+
TB.
When we realized this we immediatly tried to bring everything up and
trigger the self heal but it was literally killing our clients ending up
with high iowait and takes forever to retrieve content from the fuse share.
Only option was kill the sync process.

We tried using rsync and then trigger the self heal with no consistent
result.
We tried to remove the bad brick, cleaning up the directory on second node
and the re-create it causing this massive iowait and the same exact
situation.
We tried to clone the EBS of primary node, attach it to the secondary and
then try again with self heal with no consistent result.

We noticed that once brick two becomes online seems that is used as primary
even if configured on fstab as backupvolfile-server. I'm saying this
because some directories appear missing while is possibile to cd into which
reflects the brick status on secondary server.

Is there anything that you can suggest to solve this?
Are we missing something?

Thanks a lot for any help.

Lorenzo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170429/b682a20f/attachment.html>


More information about the Gluster-users mailing list