[Bugs] [Bug 1268125] New: glusterd memory overcommit

bugzilla at redhat.com bugzilla at redhat.com
Thu Oct 1 21:21:17 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1268125

            Bug ID: 1268125
           Summary: glusterd memory overcommit
           Product: GlusterFS
           Version: 3.7.4
         Component: unclassified
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: ryanlee at zepheira.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:

We were using Gluster 3.3 through 3.5 without issue but needed to add SSL
support.  Due to SSL bugs in earlier versions, the quickest path forward was to
upgrade the network to 3.7.  This generally appears to be fine as files added
are appearing where they should, except that it's caused glusterd to vastly
overcommit memory on both nodes where it runs.

On one (serverA, the 'master'), it had 4GB of RAM to work with, and the other
(serverB) 2GB.  Both got up to around 30GB of committed virtual memory in a
couple of weeks. When other processes were stopped on serverA and glusterd
restarted, the overcommit problem appeared to be alleviated and, if it was
growing at all, grew a lot slower.  We had to resize serverA and took it
offline; at the same time, serverB glusterd shot up to 140GB while it was
offline.  Both are currently at 2GB of RAM (for other reasons) and, after
restarting both daemons, appear to be growing in committed memory at around
4GB/day.

Version of GlusterFS package installed:

3.7.4-ubuntu1~trusty1

Location from which the packages are used:

Launchpad PPA

GlusterFS Cluster Information:

    Number of volumes: 2
    Volume Names: backup, other
    Volume on which the particular issue is seen: N/A
    Type of volumes: backup Replicate, other Distribute
    Volume options if available:

Volume Name: backup
Type: Replicate
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: serverA:/glusterfs/brick3/data
Brick2: serverB:/glusterfs/brick3/data
Options Reconfigured:
auth.ssl-allow: [names]
ssl.cipher-list: HIGH:!SSLv2
ssl.certificate-depth: 3
server.ssl: on
client.ssl: on

Volume Name: other
Type: Distribute
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: serverA:/glusterfs/brick4/data
Options Reconfigured:
[same as above]

    Client Information
        OS Type: Linux
        Mount type: GlusterFS

How reproducible:

Have not tried (sorry).

Steps to Reproduce:

    1.
    2.
    3.

Actual results:


Expected results:


Logs Information:

I stripped dates and did a uniq -n on the actual log messages for the glusterd
process.  server setup and SSL connect errors appear together.

  17299  E [socket.c:2863:socket_connect]
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xfb)
[0x7f57813d366b]
-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)
[0x7f5781185c59]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/rpc-transport/socket.so(+0x755d)
[0x7f577a62255d] ) 0-socket: invalid argument: this->private [Invalid argument]
    400  W [dict.c:1452:dict_get_with_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/xlator/mgmt/glusterd.so(build_shd_graph+0x69)
[0x7f577c95ee99]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_str_boolean+0x22)
[0x7f57813af6a2] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x19406)
[0x7f57813ad406] ) 0-dict: dict OR key (graph-check) is NULL [Invalid argument]
    217  E [socket.c:2388:socket_poller] 0-socket.management: server setup
failed
    217  E [socket.c:352:ssl_setup_connection] 0-socket.management: SSL connect
error
     16  W [dict.c:1452:dict_get_with_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/xlator/mgmt/glusterd.so(build_shd_graph+0x69)
[0x7f3c7c558e99]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_str_boolean+0x22)
[0x7f3c80fa96a2] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x19406)
[0x7f3c80fa7406] ) 0-dict: dict OR key (graph-check) is NULL [Invalid argument]
     11  E [socket.c:2501:socket_poller] 0-socket.management: error in polling
loop

Additional info: 

This is totally unrelated (probably?), but in order to get glusterd to really
start, I had to ln
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/xlator/rpc-transport -s
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/rpc-transport as the logs were
essentially complaining about being unable to find
xlator/rpc-transport/socket.so.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list