[Gluster-users] Disastrous performance with rsync to mounted Gluster volume.
Ernie Dunbar
maillist at lightspeed.ca
Thu Apr 23 17:54:35 UTC 2015
Hello everyone.
I've built a replicated Gluster cluster (volume info shown below) of two
Dell servers on a 1 GB switch, plus a second NIC on each server for
replication data. But when I try to copy our mail store from our backup
server onto the Gluster volume, I've been having nothing but trouble.
I may have messed this right up the first time, as I just used rsync to
copy all the files to the Linux filesystem on the primary Gluster
server, instead of copying the data to an NFS or Gluster mount.
Attempting to get Gluster to synchronize the files to the second Gluster
server hasn't worked out very well at all, with about half the data
actually copied to the second Gluster server. Attempts to force Gluster
to synchronize this data have all failed (Gluster appears to think the
data is already synchronized). This might be the best way of
accomplishing this in the end, but in the meantime I've tried a
different tack.
Now, I'm trying to mount the Gluster volume over the network from the
backup server, using NFS (the backup server doesn't and can't have a
compatible version of GlusterFS on it, I plan to nuke it and install an
OS that does support it, but first we have to get this mail store copied
over!). Then I use rsync to copy only missing files to the NFS share and
let Gluster do its own replication. This has been many, many times
slower than just using rsync to copy the files, even considering the
amount of data (439 GB). CPU usage on the Gluster servers is fairly
high, with a server load value of about 4 on an 8 CPU system. Network
usage is... well, not that high. Maybe topping about 50-70 Mbps. This
same story is true whether I'm looking at the network usage for the
primary, server-facing network or the secondary, Gluster-only network,
so I don't think the bottleneck is there. Hard drive utilization peaks
at around 40% but doesn't really stay that high.
One possible clue may lie in Gluster's logs. I see millions of log
entries like this:
[2015-04-23 16:40:50.122007] I
[afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
0-gv2-replicate-0:
<gfid:912eec51-89dc-40ea-9dfd-072404d306a2>/1355401127.H542717P24276.pop.lightspeed.ca:2,:
Skipping entry self-heal because of gfid absence
[2015-04-23 16:40:50.123327] I
[afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
0-gv2-replicate-0:
<gfid:912eec51-89dc-40ea-9dfd-072404d306a2>/1355413874.H20794P22730.pop.lightspeed.ca:2,:
Skipping entry self-heal because of gfid absence
[2015-04-23 16:40:50.123705] I
[afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
0-gv2-replicate-0:
<gfid:912eec51-89dc-40ea-9dfd-072404d306a2>/1355420013.H176322P3859.pop.lightspeed.ca:2,:
Skipping entry self-heal because of gfid absence
[2015-04-23 16:40:50.124030] I
[afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
0-gv2-replicate-0:
<gfid:912eec51-89dc-40ea-9dfd-072404d306a2>/1355429494.H263072P14676.pop.lightspeed.ca:2,:
Skipping entry self-heal because of gfid absence
[2015-04-23 16:40:50.124423] I
[afr-self-heal-entry.c:1909:afr_sh_entry_common_lookup_done]
0-gv2-replicate-0:
<gfid:912eec51-89dc-40ea-9dfd-072404d306a2>/1355436426.H973617P29804.pop.lightspeed.ca:2,:
Skipping entry self-heal because of gfid absence
The size and growth of these logs is at the point where I have to cut
them short every hour, or the /var partition fills up within a couple of
days.
And finally, I have the gluster volume info:
root at nfs1:/brick1/gv2/www3# gluster vol info gv2
Volume Name: gv2
Type: Replicate
Volume ID: fb06a044-7871-4362-b134-fb97433f89f7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: nfs1:/brick1/gv2
Brick2: nfs2:/brick1/gv2
Options Reconfigured:
nfs.disable: off
Any help removing myself from this mess would be greatly appreciated. :)
More information about the Gluster-users
mailing list