[Gluster-users] To good to be truth speed improvements?

Tue Jan 15 01:18:11 UTC 2019

Dear all,

I was running gluster 3.10.12 on a pair of servers and recently upgraded to
4.1.6. There is a cron job that runs nightly in one machine, which rsyncs
the data on the servers over to another machine for backup purposes. The
rsync operation runs on one of the gluster servers, which mounts the
gluster volume via fuse on /export.

When using 3.10.12, this process would start at 8:00PM nightly, and usually
end up at around 4:30AM when the servers had been freshly rebooted. From
this point, things would start taking a bit longer and stabilize ending at
around 7-9AM depending on actual file changes and at some point the servers
would start eating up so much ram (up to 30GB) and I would have to reboot
them to bring things back to normal as the file system would become
extremely slow (perhaps the memory leak I have read was present on 3.10.x).

After upgrading to 4.1.6 over the weekend, I was shocked to see the rsync
process finish in about 1 hour and 26 minutes. This is compared to 8 hours
30 mins with the older version. This is a nice speed up, however, I can
only ask myself what has changed so drastically that this process is now so
fast. Have there really been improvements in 4.1.6 that could speed this up
so dramatically? In both of my test cases, there would had not really been
a lot to copy via rsync given the fresh reboots are done on Saturday after
the sync has finished from the day before.

In general, the servers (which are accessed via samba for windows clients)
are much faster and responsive since the update to 4.1.6. Tonight I will
have the first rsync run which will actually have to copy the day's changes
and will have another point of comparison.

I am still using fuse mounts for samba, due to prior problems with vsf
=gluster, which are currently present in Samba 4.8.3-4, and already
documented in bugs, for which patches exist, but no official updated samba
packages have been released yet. Since I was going from 3.10.12 to 4.1.6 I
also did not want to change other things to make sure I could track any
issues just related to the change in gluster versions and eliminate other
complexity.

The file system currently has about 16TB of data in
5142816 files and 696544 directories

I've just ran the following code to count files and dirs and it took 67mins
38.957 secs to complete in this gluster volume:
https://github.com/ChristopherSchultz/fast-file-count

# time ( /root/sbin/dircnt /export )
/export contains 5142816 files and 696544 directories

real    67m38.957s
user    0m6.225s
sys     0m48.939s

The gluster options set on the volume are:
https://termbin.com/yxtd

# gluster v status export
Status of volume: export
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.0.1.7:/bricks/hdds/brick           49157     0          Y
 13986
Brick 10.0.1.6:/bricks/hdds/brick           49153     0          Y
 9953
Self-heal Daemon on localhost               N/A       N/A        Y
 21934
Self-heal Daemon on 10.0.1.5                N/A       N/A        Y
 4598
Self-heal Daemon on 10.0.1.6                N/A       N/A        Y
 14485

Task Status of Volume export
------------------------------------------------------------------------------
There are no active volume tasks

Truth, there is a 3rd server here, but no bricks on it.

Thoughts?

Diego

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
<#m_-6479459361629161759_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190114/3adbd0fd/attachment.html>