[Gluster-users] Memory leak in 3.6.*?

Fri Jul 22 20:16:38 UTC 2016

Le 22/07/2016 21:12, Yannick Perret a écrit :
> Le 22/07/2016 17:47, Mykola Ulianytskyi a écrit :
>> Hi
>>
>>>   3.7 clients are not compatible with 3.6 servers
>> Can you provide more info?
>>
>> I use some 3.7 clients with 3.6 servers and don't see issues.
> Well,
> with client 3.7.13 compiled on the same machine when I try the same 
> mount I get:
> # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
> Mount failed. Please check the log file for more details.
>
> Checking the logs (/var/log/glusterfs/zog.log) I have:
> [2016-07-22 19:05:40.249143] I [MSGID: 100030] 
> [glusterfsd.c:2338:main] 0-/usr/local/sbin/glusterfs: Started running 
> /usr/local/sbin/glusterfs version 3.7.13 (args: 
> /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain 
> --volfile-id=BACKUP-ADMIN-DATA /zog)
> [2016-07-22 19:05:40.258437] I [MSGID: 101190] 
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started 
> thread with index 1
> [2016-07-22 19:05:40.259480] W [socket.c:701:__socket_rwv] 
> 0-glusterfs: readv on <the-IP>:24007 failed (Aucune donnée disponible)
> [2016-07-22 19:05:40.259859] E [rpc-clnt.c:362:saved_frames_unwind] 
> (--> 
> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x175)[0x7fad7d039335] 
> (--> 
> /usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1b3)[0x7fad7ce04e73] (--> 
> /usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fad7ce04f6e] 
> (--> 
> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7fad7ce065ee] 
> (--> 
> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7fad7ce06de8] 
> ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) 
> op(GETSPEC(2)) called at 2016-07-22 19:05:40.258858 (xid=0x1)
> [2016-07-22 19:05:40.259894] E 
> [glusterfsd-mgmt.c:1690:mgmt_getspec_cbk] 0-mgmt: failed to fetch 
> volume file (key:BACKUP-ADMIN-DATA)
> [2016-07-22 19:05:40.259939] W [glusterfsd.c:1251:cleanup_and_exit] 
> (-->/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1de) 
> [0x7fad7ce04e9e] -->/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x454) 
> [0x40d564] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b) 
> [0x407eab] ) 0-: received signum (0), shutting down
> [2016-07-22 19:05:40.259965] I [fuse-bridge.c:5720:fini] 0-fuse: 
> Unmounting '/zog'.
> [2016-07-22 19:05:40.260913] W [glusterfsd.c:1251:cleanup_and_exit] 
> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7fad7c0a30a4] 
> -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xc5) [0x408015] 
> -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b) [0x407eab] ) 0-: 
> received signum (15), shutting down
>
Hmmm… I just saw that logs are (partly) translated which can be harder 
to understand for non-french speakers.
"Aucune donnée disponible" means: no available data

BTW If I could manage 3.7 clients to work with my servers and if the 
memory leak don't exists in 3.7 it would be fine for me.

--
Y.

> I did not go further about that as I just presumed that 3.7 series was 
> not compatible with 3.6 servers but it's maybe something else. But 
> here it is the same client, the same server(s) and the same volume.
>
> The compilation is with features (built with "configure 
> --disable-tiering" as I don't have installed stuff for that):
> FUSE client          : yes
> Infiniband verbs     : no
> epoll IO multiplex   : yes
> argp-standalone      : no
> fusermount           : yes
> readline             : yes
> georeplication       : yes
> Linux-AIO            : no
> Enable Debug         : no
> Block Device xlator  : no
> glupy                : yes
> Use syslog           : yes
> XML output           : yes
> QEMU Block formats   : no
> Encryption xlator    : yes
> Unit Tests           : no
> POSIX ACLs           : yes
> Data Classification  : no
> firewalld-config     : no
>
> Regards,
> -- 
> Y.
>
>
>> Thank you
>>
>> -- 
>> With best regards,
>> Mykola
>>
>>
>> On Fri, Jul 22, 2016 at 4:31 PM, Yannick Perret
>> <yannick.perret at liris.cnrs.fr> wrote:
>>> Note: I'm have a dev client machine so I can perform tests or recompile
>>> glusterfs client if it can helps getting data about that.
>>>
>>> I did not test this problem against 3.7.x version as my 2 servers 
>>> are in use
>>> and I can't upgrade them at this time, and 3.7 clients are not 
>>> compatible
>>> with 3.6 servers (as far as I can see from my tests).
>>>
>>> -- 
>>> Y.
>>>
>>>
>>> Le 22/07/2016 14:06, Yannick Perret a écrit :
>>>
>>> Hello,
>>> some times ago I posted about a memory leak in client process, but 
>>> it was on
>>> a very old 32bit machine (both kernel and OS) and I don't found 
>>> evidences
>>> about a similar problem on our recent machines.
>>> But I performed more tests and I have the same problem.
>>>
>>> Clients are 64bit Debian 8.2 machines. Glusterfs client on these 
>>> machines is
>>> compiled from sources with activated stuff:
>>> FUSE client          : yes
>>> Infiniband verbs     : no
>>> epoll IO multiplex   : yes
>>> argp-standalone      : no
>>> fusermount           : yes
>>> readline             : yes
>>> georeplication       : yes
>>> Linux-AIO            : no
>>> Enable Debug         : no
>>> systemtap            : no
>>> Block Device xlator  : no
>>> glupy                : no
>>> Use syslog           : yes
>>> XML output           : yes
>>> QEMU Block formats   : no
>>> Encryption xlator    : yes
>>> Erasure Code xlator  : yes
>>>
>>> I tested both 3.6.7 and 3.6.9 version on client (3.6.7 is the one 
>>> installed
>>> on our machines, even on servers, 3.6.9 is for testing with last 3.6
>>> version).
>>>
>>> Here are the operations on the client (also performed with similar 
>>> results
>>> with 3.6.7 version):
>>> # /usr/local/sbin/glusterfs --version
>>> glusterfs 3.6.9 built on Jul 22 2016 13:27:42
>>> (…)
>>> # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
>>> # cd /usr/
>>> # cp -Rp * /zog/TEMP/
>>> Then monitoring memory used by glusterfs process while 'cp' is running
>>> (resp. VSZ and RSS from 'ps'):
>>> 284740 70232
>>> 284740 70232
>>> 284876 71704
>>> 285000 72684
>>> 285136 74008
>>> 285416 75940
>>> (…)
>>> 368684 151980
>>> 369324 153768
>>> 369836 155576
>>> 370092 156192
>>> 370092 156192
>>> Here both sizes are stable and correspond to the end of 'cp' command.
>>> If I restart an other 'cp' (even on the same directories) size 
>>> starts again
>>> to increase.
>>> If I perform a 'ls -lR' in the directory size also increase:
>>> 370756 192488
>>> 389964 212148
>>> 390948 213232
>>> (here I ^C the 'ls')
>>>
>>> When doing nothing the size don't increase but never decrease (calling
>>> 'sync' don't change the situation).
>>> Sending a HUP signal to glusterfs process also increases memory (390948
>>> 213324 → 456484 213320).
>>> Changing volume configuration (changing 
>>> diagnostics.client-sys-log-level
>>> value) don't change anything.
>>>
>>> Here the actual ps:
>>> root     17041  4.9  5.2 456484 213320 ?       Ssl  13:29 1:21
>>> /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain
>>> --volfile-id=BACKUP-ADMIN-DATA /zog
>>>
>>> Of course umouting/remounting fall back to "start" size:
>>> # umount /zog
>>> # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
>>> → root     28741  0.3  0.7 273320 30484 ?        Ssl  13:57 0:00
>>> /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain
>>> --volfile-id=BACKUP-ADMIN-DATA /zog
>>>
>>>
>>> I didn't saw this before because most of our volumes are mounted "on 
>>> demand"
>>> for some storage activities or are permanently mounted but with very 
>>> few
>>> activity.
>>> But clearly this memory usage driff is a long-term problem. On the 
>>> old 32bit
>>> machine I had this problem ("solved" by using NFS mounts in order to 
>>> wait
>>> for this old machine to be replaced) and it lead to glusterfs being 
>>> killed
>>> by OS when out of free memory. It was faster than what I describe 
>>> here but
>>> it's just a question of time.
>>>
>>>
>>> Thanks for any help about that.
>>>
>>> Regards,
>>> -- 
>>> Y.
>>>
>>>
>>> The corresponding volume on servers is (if it can help):
>>> Volume Name: BACKUP-ADMIN-DATA
>>> Type: Replicate
>>> Volume ID: 306d57f3-fb30-4bcc-8687-08bf0a3d7878
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: sto1.my.domain:/glusterfs/backup-admin/data
>>> Brick2: sto2.my.domain:/glusterfs/backup-admin/data
>>> Options Reconfigured:
>>> diagnostics.client-sys-log-level: WARNING
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160722/fec56bde/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160722/fec56bde/attachment.p7s>