[Gluster-users] memory leak in 3.3.1 rebalance?
Pierre-Francois Laquerre
pierre.francois at nec-labs.com
Tue Mar 5 21:33:43 UTC 2013
I started rebalancing my 25x2 distributed-replicate volume two days ago.
Since then, the memory usage of the rebalance processes has been
steadily climbing by 1-2 megabytes per minute. Following
http://gluster.org/community/documentation/index.php/High_Memory_Usage,
I tried "echo 2 > /proc/sys/vm/drop_caches". This had no effect on the
processes' memory usage. Some of the servers are already eating >10G of
memory. At this rate, I will have to cancel this rebalance, even though
brick usage is heavily skewed right now (most bricks are at 87%
capacity, but recently added ones are at 18-28%).
Any ideas what might be causing this? The only references to rebalance
memory leaks I could find were related to 3.2.x, not 3.3.1.
gluster volume info:
Volume Name: bigdata
Type: Distributed-Replicate
Volume ID: 56498956-7b4b-4ee3-9d2b-4c8cfce26051
Status: Started
Number of Bricks: 25 x 2 = 50
Transport-type: tcp
Bricks:
Brick1: ml43:/mnt/donottouch/localb
Brick2: ml44:/mnt/donottouch/localb
Brick3: ml43:/mnt/donottouch/localc
Brick4: ml44:/mnt/donottouch/localc
Brick5: ml45:/mnt/donottouch/localb
Brick6: ml46:/mnt/donottouch/localb
Brick7: ml45:/mnt/donottouch/localc
Brick8: ml46:/mnt/donottouch/localc
Brick9: ml47:/mnt/donottouch/localb
Brick10: ml48:/mnt/donottouch/localb
Brick11: ml47:/mnt/donottouch/localc
Brick12: ml48:/mnt/donottouch/localc
Brick13: ml45:/mnt/donottouch/locald
Brick14: ml46:/mnt/donottouch/locald
Brick15: ml47:/mnt/donottouch/locald
Brick16: ml48:/mnt/donottouch/locald
Brick17: ml51:/mnt/donottouch/localb
Brick18: ml52:/mnt/donottouch/localb
Brick19: ml51:/mnt/donottouch/localc
Brick20: ml52:/mnt/donottouch/localc
Brick21: ml51:/mnt/donottouch/locald
Brick22: ml52:/mnt/donottouch/locald
Brick23: ml59:/mnt/donottouch/locald
Brick24: ml54:/mnt/donottouch/locald
Brick25: ml59:/mnt/donottouch/localc
Brick26: ml54:/mnt/donottouch/localc
Brick27: ml59:/mnt/donottouch/localb
Brick28: ml54:/mnt/donottouch/localb
Brick29: ml55:/mnt/donottouch/localb
Brick30: ml29:/mnt/donottouch/localb
Brick31: ml55:/mnt/donottouch/localc
Brick32: ml29:/mnt/donottouch/localc
Brick33: ml30:/mnt/donottouch/localc
Brick34: ml31:/mnt/donottouch/localc
Brick35: ml30:/mnt/donottouch/localb
Brick36: ml31:/mnt/donottouch/localb
Brick37: ml40:/mnt/donottouch/localb
Brick38: ml41:/mnt/donottouch/localb
Brick39: ml40:/mnt/donottouch/localc
Brick40: ml41:/mnt/donottouch/localc
Brick41: ml56:/mnt/donottouch/localb
Brick42: ml57:/mnt/donottouch/localb
Brick43: ml56:/mnt/donottouch/localc
Brick44: ml57:/mnt/donottouch/localc
Brick45: ml25:/mnt/donottouch/localb
Brick46: ml26:/mnt/donottouch/localb
Brick47: ml01:/mnt/donottouch/localb/brick
Brick48: ml25:/mnt/donottouch/localc/brick
Brick49: ml01:/mnt/donottouch/localc/brick
Brick50: ml26:/mnt/donottouch/localc/brick
Options Reconfigured:
nfs.register-with-portmap: OFF
nfs.disable: on
performance.quick-read: on
The majority of these bricks are ext4, except for the latest 7 which are
xfs. Each is backed by a 2T hard drive.
gluster --version:
glusterfs 3.3.1 built on Oct 11 2012 21:49:37
cat /etc/system-release:
Scientific Linux release 6.1 (Carbon)
uname -a:
Linux ml01 2.6.32-131.17.1.el6.x86_64 #1 SMP Wed Oct 5 17:19:54 CDT 2011
x86_64 x86_64 x86_64 GNU/Linux
df -i /mnt/bigdata:
Filesystem Inodes IUsed IFree IUse% Mounted on
ml43:/bigdata 3272292992 114236317 3158056675 4% /mnt/bigdata
df /mnt/bigdata:
Filesystem 1K-blocks Used Available Use% Mounted on
ml43:/bigdata 48160570880 33787913600 12223793792 74% /mnt/bigdata
the process I am referring to:
/usr/sbin/glusterfs -s localhost --volfile-id bigdata --xlator-option
*dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes
--xlator-option *dht.assert-no-child-down=yes --xlator-option
*replicate*.data-self-heal=off --xlator-option
*replicate*.metadata-self-heal=off --xlator-option
*replicate*.entry-self-heal=off --xlator-option *dht.rebalance-cmd=1
--xlator-option *dht.node-uuid=5c338e03-28ff-429b-b702-0a04e25565f8
--socket-file
/var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.sock
--pid-file
/var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.pid
-l /var/log/glusterfs/bigdata-rebalance.log
I am seeing a lot of:
[2013-03-05 10:59:51.170051] I [dht-rebalance.c:647:dht_migrate_file]
0-bigdata-dht: /a/c/lu/lu-nod/_4.nrm: attempting to move from
bigdata-replicate-23 to bigdata-replicate-17
[2013-03-05 10:59:51.296487] W
[dht-rebalance.c:361:__dht_check_free_space] 0-bigdata-dht: data
movement attempted from node (bigdata-replicate-23) with higher disk
space to a node (bigdata-replicate-17) with lesser disk space
(/a/c/lu/lu-nod/_4.nrm)
[2013-03-05 10:59:51.296604] E
[dht-rebalance.c:1202:gf_defrag_migrate_data] 0-bigdata-dht:
migrate-data failed for /a/c/lu/lu-nod/_4.nrm
in the rebalance log, and a lot of:
[2013-03-05 11:00:02.609033] I [server3_1-fops.c:576:server_mknod_cbk]
0-bigdata-server: 586514: MKNOD (null) (--) ==> -1 (File exists)
and
[2013-03-05 10:59:58.755505] E [posix-helpers.c:701:posix_handle_pair]
0-bigdata-posix: /mnt/donottouch/localb/brick/b/somedir/foobar:
key:trusted.glusterfs.dht.linkto error:File exists
[2013-03-05 10:59:58.755576] E [posix.c:1755:posix_create]
0-bigdata-posix: setting xattrs on
/mnt/donottouch/localb/brick/b/somedir/foobar failed (File exists)
in the brick logs.
gluster volume rebalance bigdata status:
Node, Rebalanced-files, size,
scanned, failures, status
ml01 296 279.1MB
6761635 29006 in progress
ml25 1208 1.1GB
6167501 75131 in progress
ml26 0 0Bytes
7197961 0 in progress
ml29 0 0Bytes
5892140 0 in progress
ml30 118486 74.3GB
5867068 107288 in progress
ml31 0 0Bytes
6802084 0 in progress
ml40 55652 41.2GB
6100163 94717 in progress
ml41 0 0Bytes
6151999 0 in progress
ml43 124390 53.3GB
5470703 101802 in progress
ml44 0 0Bytes
6789294 0 in progress
ml45 146811 61.1GB
4829884 242848 in progress
ml46 0 0Bytes
7184315 0 in progress
ml47 82719 67.5GB
5954615 153226 in progress
ml48 0 0Bytes
7826333 0 in progress
ml51 156903 76.2GB
4681106 146970 in progress
ml52 0 0Bytes
6154015 0 in progress
ml54 0
0Bytes 42773 0 in progress
ml55 44194 33.4GB
6113086 82874 in progress
ml56 16060 16.5GB
5949356 136277 in progress
ml57 0 0Bytes
7845347 0 in progress
ml59 128070 76.7GB
5847966 145739 in progress
More information about the Gluster-users
mailing list