[Gluster-users] Geo-Replication memory leak on slave node

Kotresh Hiremath Ravishankar khiremat at redhat.com
Thu Jun 7 03:42:20 UTC 2018


Hi Mark,

Few questions.

1. Is this trace back consistently hit? I just wanted to confirm whether
it's transient which occurs once in a while and gets back to normal?
2. Please upload the complete geo-rep logs from both master and slave.
3. Are the gluster versions same across master and slave?

Thanks,
Kotresh HR

On Wed, Jun 6, 2018 at 7:10 PM, Mark Betham <
mark.betham at performancehorizon.com> wrote:

> Dear Gluster-Users,
>
> I have geo-replication setup and configured between 2 Gluster pools
> located at different sites.  What I am seeing is an error being reported
> within the geo-replication slave log as follows;
>
> *[2018-06-05 12:05:26.767615] E
> [syncdutils(slave):331:log_raise_exception] <top>: FAIL: *
> *Traceback (most recent call last):*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
> 361, in twrap*
> *    tf(*aa)*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1009,
> in <lambda>*
> *    t = syncdutils.Thread(target=lambda: (repce.service_loop(),*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 90, in
> service_loop*
> *    self.q.put(recv(self.inf))*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 61, in
> recv*
> *    return pickle.load(inf)*
> *ImportError: No module named
> h_2013-04-26-04:02:49-2013-04-26_11:02:53.gz.15WBuUh*
> *[2018-06-05 12:05:26.768085] E [repce(slave):117:worker] <top>: call
> failed: *
> *Traceback (most recent call last):*
> *  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in
> worker*
> *    res = getattr(self.obj, rmeth)(*in_data[2:])*
> *TypeError: getattr(): attribute name must be string*
>
> From this point in time the slave server begins to consume all of its
> available RAM until it becomes non-responsive.  Eventually the gluster
> service seems to kill off the offending process and the memory is returned
> to the system.  Once the memory has been returned to the remote slave
> system the geo-replication often recovers and data transfer resumes.
>
> I have attached the full geo-replication slave log containing the error
> shown above.  I have also attached an image file showing the memory usage
> of the affected storage server.
>
> We are currently running Gluster version 3.12.9 on top of CentOS 7.5
> x86_64.  The system has been fully patched and is running the latest
> software, excluding glibc which had to be downgraded to get geo-replication
> working.
>
> The Gluster volume runs on a dedicated partition using the XFS filesystem
> which in turn is running on a LVM thin volume.  The physical storage is
> presented as a single drive due to the underlying disks being part of a
> raid 10 array.
>
> The Master volume which is being replicated has a total of 2.2 TB of data
> to be replicated.  The total size of the volume fluctuates very little as
> data being removed equals the new data coming in.  This data is made up of
> many thousands of files across many separated directories.  Data file sizes
> vary from the very small (>1K) to the large (>1Gb).  The Gluster service
> itself is running with a single volume in a replicated configuration across
> 3 bricks at each of the sites.  The delta changes being replicated are on
> average about 100GB per day, where this includes file creation / deletion /
> modification.
>
> The config for the geo-replication session is as follows, taken from the
> current source server;
>
> *special_sync_mode: partial*
> *gluster_log_file:
> /var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.gluster.log*
> *ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no
> -i /var/lib/glusterd/geo-replication/secret.pem*
> *change_detector: changelog*
> *session_owner: 40e9e77a-034c-44a2-896e-59eec47e8a84*
> *state_file:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.status*
> *gluster_params: aux-gfid-mount acl*
> *log_rsync_performance: true*
> *remote_gsyncd: /nonexistent/gsyncd*
> *working_dir:
> /var/lib/misc/glusterfsd/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1*
> *state_detail_file:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1-detail.status*
> *gluster_command_dir: /usr/sbin/*
> *pid_file:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/monitor.pid*
> *georep_session_working_dir:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/*
> *ssh_command_tar: ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem*
> *master.stime_xattr_name:
> trusted.glusterfs.40e9e77a-034c-44a2-896e-59eec47e8a84.ccfaed9b-ff4b-4a55-acfa-03f092cdf460.stime*
> *changelog_log_file:
> /var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1-changes.log*
> *socketdir: /var/run/gluster*
> *volume_id: 40e9e77a-034c-44a2-896e-59eec47e8a84*
> *ignore_deletes: false*
> *state_socket_unencoded:
> /var/lib/glusterd/geo-replication/glustervol0_storage-server.local_glustervol1/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.socket*
> *log_file:
> /var/log/glusterfs/geo-replication/glustervol0/ssh%3A%2F%2Froot%40storage-server.local%3Agluster%3A%2F%2F127.0.0.1%3Aglustervol1.log*
>
> If any further information is required in order to troubleshoot this issue
> then please let me know.
>
> I would be very grateful for any help or guidance received.
>
> Many thanks,
>
> Mark Betham.
>
>
>
>
> This email may contain confidential material; unintended recipients must
> not disseminate, use, or act upon any information in it. If you received
> this email in error, please contact the sender and permanently delete the
> email.
> Performance Horizon Group Limited | Registered in England & Wales 07188234
> | Level 8, West One, Forth Banks, Newcastle upon Tyne, NE1 3PA
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180607/c733d25f/attachment.html>


More information about the Gluster-users mailing list