[Bugs] [Bug 1175617] New: Glusterd gets killed by oom-killer because of memory consumption

bugzilla at redhat.com bugzilla at redhat.com
Thu Dec 18 08:18:24 UTC 2014


https://bugzilla.redhat.com/show_bug.cgi?id=1175617

            Bug ID: 1175617
           Summary: Glusterd gets killed by oom-killer because of memory
                    consumption
           Product: GlusterFS
           Version: 3.6.1
         Component: glusterd
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: mikko.tiainen at csc.fi
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:
After upgrading the glustefs from 3.5.2 to 3.6.1 on environment were two
glusters ( one Distributed-Replicate & one Distribute) are formed as follows:

gluster volume info

Volume Name: ingest_vol
Type: Distributed-Replicate
Volume ID: acdd2208-5ed1-4729-9d27-923c42f22e2c
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: passtorage1:/mnt/ingest/brick
Brick2: passtorage2:/mnt/ingest/brick
Brick3: passtorage3:/mnt/ingest/brick
Brick4: passtorage4:/mnt/ingest/brick
Options Reconfigured:
user.cifs: disable
performance.force-readdirp: off
cluster.extra-hash-regex: "(.*)\\.tmp"
performance.lazy-open: off
performance.strict-o-direct: on
performance.flush-behind: on
performance.read-ahead: on
performance.write-behind: on
performance.stat-prefetch: on
nfs.disable: on

Volume Name: storage_vol01
Type: Distribute
Volume ID: 946a01dd-5546-4e1a-b1c1-fd02fb5d157a
Status: Started
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: passtorage1:/mnt/storage/brick
Brick2: passtorage2:/mnt/storage/brick
Brick3: passtorage3:/mnt/storage/brick
Brick4: passtorage4:/mnt/storage/brick
Brick5: passtorage5:/mnt/storage/brick
Options Reconfigured:
user.cifs: disable
nfs.disable: on

machine named passtorage1 glusterd component tries to allocated all the memory
from the OS and gets killed by oom-killer. Gluster did run 2 and half days
before glusterd crash with light load.

All machines have memory as follows:
cat /proc/meminfo 
MemTotal:       99025408 kB
cat /proc/swaps 
Filename                Type        Size    Used    Priority
/dev/dm-1                               partition    8388604    32568    -1


Following glusterd logs are gathered from this incident:

passtorage1:
[2014-12-11 22:33:59.999976] E [glusterd-mgmt.c:127:gd_mgmt_v3_collate_errors]
0-management: Locking failed on passtorage4. Please check log file for details.

passtorage2:
[2014-12-11 22:38:34.095010] I [MSGID: 106004]
[glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer
b15935ea-4e92-42d5-9828-fedb1877a83a, in Peer in Cluster state, has
disconnected from glusterd.
[2014-12-11 22:38:34.095746] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f49103e0420] (-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f4905d49228]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f4905cbe1c2]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4905ca9980]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7f49101b5f11] )))))
0-management: Lock for vol ingest_vol not held
[2014-12-11 22:38:34.096080] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f49103e0420] (-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f4905d49228]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f4905cbe1c2]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f4905ca9980]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7f49101b5f11] )))))
0-management: Lock for vol storage_vol01 not held
[2014-12-11 22:38:34.096134] E [glusterd-utils.c:148:glusterd_lock]
0-management: Unable to get lock for uuid:
c20d61b6-0b70-4cae-a941-b4e5e5168548, lock held by:
765b1cb3-354d-4dd3-9ca5-59b6d0081e13

passtorage3:
[2014-12-11 22:38:34.094573] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f04a77ef420] (-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f049d158228]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f049d0cd1c2]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f049d0b8980]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7f04a75c4f11] )))))
0-management: Lock for vol ingest_vol not held
[2014-12-11 22:38:34.094759] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f04a77ef420] (-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f049d158228]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f049d0cd1c2]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f049d0b8980]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7f04a75c4f11] )))))
0-management: Lock for vol storage_vol01 not held
[2014-12-11 22:38:34.094785] E [glusterd-utils.c:148:glusterd_lock]
0-management: Unable to get lock for uuid:
765b1cb3-354d-4dd3-9ca5-59b6d0081e13, lock held by:
765b1cb3-354d-4dd3-9ca5-59b6d0081e13

[2014-12-11 22:38:34.094289] I [MSGID: 106004]
[glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer
b15935ea-4e92-42d5-9828-fedb1877a83a, in Peer in Cluster state, has
disconnected from glusterd.

passtorage4:
[2014-12-11 22:33:02.912414] W [glusterd-locks.c:550:glusterd_mgmt_v3_lock]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fd1648fb420] (-->
/usr/lib64/glusterf
s/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1ca)[0x7fd15a264baa]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0x4eb9f)[0x7fd15a1e0b9f]
(--> /usr/lib64/
glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_op_sm+0x1e5)[0x7fd15a1e4005]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0xeba44)[0x7fd15a27da44]
))))) 0-managem
ent: Lock for storage_vol01 held by b15935ea-4e92-42d5-9828-fedb1877a83a
[2014-12-11 22:33:02.912468] E [glusterd-op-sm.c:3058:glusterd_op_ac_lock]
0-management: Unable to acquire lock for storage_vol01
[2014-12-11 22:33:02.912539] E [glusterd-op-sm.c:6584:glusterd_op_sm]
0-management: handler returned: -1

passtorage5:
[2014-12-11 22:23:02.894479] W [glusterd-locks.c:550:glusterd_mgmt_v3_lock]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fd20e981420] (-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x1ca)[0x7fd204cebbaa]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0x4eb9f)[0x7fd204c67b9f]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_op_sm+0x1e5)[0x7fd204c6b005]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0xeba44)[0x7fd204d04a44]
))))) 0-management: Lock for ingest_vol held by
b15935ea-4e92-42d5-9828-fedb1877a83a
[2014-12-11 22:23:02.894524] E [glusterd-op-sm.c:3058:glusterd_op_ac_lock]
0-management: Unable to acquire lock for ingest_vol
[2014-12-11 22:23:02.894594] E [glusterd-op-sm.c:6584:glusterd_op_sm]
0-management: handler returned: -1

[2014-12-11 22:38:34.094271] I [MSGID: 106004]
[glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer
b15935ea-4e92-42d5-9828-fedb1877a83a, in Peer in Cluster state, has
disconnected from glusterd.
[2014-12-11 22:38:34.094732] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fd20e981420] (-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7fd204ceb228]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7fd204c601c2]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fd204c4b980]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7fd20e756f11] )))))
0-management: Lock for vol ingest_vol not held
[2014-12-11 22:38:34.095098] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fd20e981420] (-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7fd204ceb228]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7fd204c601c2]
(-->
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fd204c4b980]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a1)[0x7fd20e756f11] )))))
0-management: Lock for vol storage_vol01 not held
[2014-12-11 22:38:34.095152] E [glusterd-utils.c:148:glusterd_lock]
0-management: Unable to get lock for uuid:
b35df837-1761-41dc-8e27-8d99c75dbe79, lock held by:
765b1cb3-354d-4dd3-9ca5-59b6d0081e13


Version-Release number of selected component (if applicable):
3.6.1 glusterd

How reproducible:
not sure howto reproduce, system was running three days and then one glusterd
process crashed

Steps to Reproduce:
1. updgrade glusterfs into 3.6.1 release
2. run the gluster until one glusterd gets killed
3.

Actual results:
glusterd process gets killed by oom-killer after running the gluster some time

Expected results:
glusterd does not try to allocate all the memory from OS but runs with moderate
memory consumption.

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list