[Gluster-users] glusterd 100% cpu upon volume status inode
Rumen Telbizov
telbizov at gmail.com
Wed Feb 11 18:01:18 UTC 2015
Hello,
Thank you for the detailed and informative answer Kaushal. I appreciate
your input.
As far as I understand this doesn't sound like something that is on the map
for being resolved any time soon? Would running 3.6 make any difference?
Talking about version - which version is recommended for stable production
work? 3.5 or 3.6?
Regards,
Rumen Telbizov
On Tue, Feb 10, 2015 at 9:27 PM, Kaushal M <kshlmster at gmail.com> wrote:
> There is nothing wrong with your setup. This is a known issue (at least to
> me).
>
> The problem here lies with how GlusterD collect and collate the
> information on the open inodes on a volume, which isn't really efficient as
> of now. The collection and collation process involves doing several small
> (at least 2, but pretty sure it's more) memory allocations for each inode
> open on the bricks. This doesn't really scale well when we have lots of
> files, and is CPU and memory intensive.
>
> In your case, with a 3-way replica volume, you'd have inodes atleast 3x
> the number of files (~150000). This means atleast 300k small memory
> allocations need to be done by GlusterD. This is going to take a lot of
> time, CPU time and memory to complete. The process will eventually complete
> provided you have enough memory available. But as the gluster CLI only
> waits for 2 minutes for a reply, you will not get to see the output as
> you've experienced. But GlusterD will continue and finish the asked
> operation.
>
> Also, other CLI commands will fail till the existing operation finishes.
> GlusterD acquires a transaction lock when it begins an operation and
> releases it once the operation is complete. As GlusterD still continues
> with the operation after CLI times out, newer commands will fail as they
> cannot get the lock.
>
> ~kaushal
>
> On Wed, Feb 11, 2015 at 4:40 AM, Rumen Telbizov <telbizov at gmail.com>
> wrote:
>
>> Hello everyone,
>>
>> I am new to GlusterFS and I am in the process of evaluating it as a
>> possible alternative to some other options. While playing with it I came
>> across this problem. Please direct me if there's something wrong that I am
>> might be doing.
>>
>> When I run *volume status myvolume inode* it causes the glusterd process
>> to hit *100% cpu utilization* and no commands work furthermore. If I
>> restart the glusterd process the problem is "resolved" until I run the same
>> command again. Here's some more debug:
>>
>> # time gluster volume status myvolume inode
>> real 2m0.095s
>>
>> ...
>> [2015-02-10 22:49:38.662545] E [name.c:147:client_fill_address_family]
>> 0-glusterfs: transport.address-family not specified. Could not guess
>> default value from (remote-host:(null) or
>> transport.unix.connect-path:(null)) options
>> [2015-02-10 22:49:41.663081] W [dict.c:1055:data_to_str]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24)
>> [0x7fb21d6d2e24]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
>> [0x7fb21d6d990e]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x202)
>> [0x7fb21d6d95f2]))) 0-dict: data is NULL
>> [2015-02-10 22:49:41.663101] W [dict.c:1055:data_to_str]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24)
>> [0x7fb21d6d2e24]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
>> [0x7fb21d6d990e]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x20d)
>> [0x7fb21d6d95fd]))) 0-dict: data is NULL
>> [2015-02-10 22:49:41.663107] E [name.c:147:client_fill_address_family]
>> 0-glusterfs: transport.address-family not specified. Could not guess
>> default value from (remote-host:(null) or
>> transport.unix.connect-path:(null)) options
>> [2015-02-10 22:49:44.663576] W [dict.c:1055:data_to_str]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24)
>> [0x7fb21d6d2e24]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
>> [0x7fb21d6d990e]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x202)
>> [0x7fb21d6d95f2]))) 0-dict: data is NULL
>> [2015-02-10 22:49:44.663595] W [dict.c:1055:data_to_str]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24)
>> [0x7fb21d6d2e24]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
>> [0x7fb21d6d990e]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x20d)
>> [0x7fb21d6d95fd]))) 0-dict: data is NULL
>> [2015-02-10 22:49:44.663601] E [name.c:147:client_fill_address_family]
>> 0-glusterfs: transport.address-family not specified. Could not guess
>> default value from (remote-host:(null) or
>> transport.unix.connect-path:(null)) options
>> [2015-02-10 22:49:47.664111] W [dict.c:1055:data_to_str]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24)
>> [0x7fb21d6d2e24]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
>> [0x7fb21d6d990e]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x202)
>> [0x7fb21d6d95f2]))) 0-dict: data is NULL
>> [2015-02-10 22:49:47.664131] W [dict.c:1055:data_to_str]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24)
>> [0x7fb21d6d2e24]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
>> [0x7fb21d6d990e]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x20d)
>> [0x7fb21d6d95fd]))) 0-dict: data is NULL
>> [2015-02-10 22:49:47.664137] E [name.c:147:client_fill_address_family]
>> 0-glusterfs: transport.address-family not specified. Could not guess
>> default value from (remote-host:(null) or
>> transport.unix.connect-path:(null)) options
>> [2015-02-10 22:49:47.728428] I [input.c:36:cli_batch] 0-:* Exiting with:
>> 110*
>>
>>
>>
>> # time gluster volume status
>> Another transaction is in progress. Please try again after sometime.
>> real 0m10.223s
>>
>> [2015-02-10 22:50:29.937290] E [glusterd-utils.c:153:glusterd_lock]
>> 0-management: Unable to get lock for uuid:
>> c7d1e1ea-c5a5-4bcf-802c-aa04dd2e55ba, lock held by:
>> c7d1e1ea-c5a5-4bcf-802c-aa04dd2e55ba
>> [2015-02-10 22:50:29.937316] E
>> [glusterd-syncop.c:1221:gd_sync_task_begin] 0-management: Unable to acquire
>> lock
>>
>>
>> The volume contains the extracted linux kernel - so lots of small files
>> (48425). Here's the configuration:
>>
>> # gluster volume status
>> Status of volume: myvolume
>> Gluster process Port Online Pid
>>
>> ------------------------------------------------------------------------------
>> Brick 10.12.10.7:/var/lib/glusterfs_disks/disk01/brick 49152 Y 3321
>> Brick 10.12.10.8:/var/lib/glusterfs_disks/disk01/brick 49152 Y 3380
>> Brick 10.12.10.9:/var/lib/glusterfs_disks/disk01/brick 49152 Y 3359
>> Brick 10.12.10.7:/var/lib/glusterfs_disks/disk02/brick 49154 Y 18687
>> Brick 10.12.10.8:/var/lib/glusterfs_disks/disk02/brick 49156 Y 32699
>> Brick 10.12.10.9:/var/lib/glusterfs_disks/disk02/brick 49154 Y 17932
>> Self-heal Daemon on localhost N/A Y 25005
>> Self-heal Daemon on 10.12.10.9 N/A Y 17952
>> Self-heal Daemon on 10.12.10.8 N/A Y 32724
>>
>> Task Status of Volume myvolume
>>
>> ------------------------------------------------------------------------------
>> Task : Rebalance
>> ID : eec4f2c1-85f5-400d-ac42-6da63ec7434f
>> Status : completed
>>
>>
>> # gluster volume info
>>
>> Volume Name: myvolume
>> Type: Distributed-Replicate
>> Volume ID: e513a56f-049f-4c8e-bc75-4fb789e06c37
>> Status: Started
>> Number of Bricks: 2 x 3 = 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.12.10.7:/var/lib/glusterfs_disks/disk01/brick
>> Brick2: 10.12.10.8:/var/lib/glusterfs_disks/disk01/brick
>> Brick3: 10.12.10.9:/var/lib/glusterfs_disks/disk01/brick
>> Brick4: 10.12.10.7:/var/lib/glusterfs_disks/disk02/brick
>> Brick5: 10.12.10.8:/var/lib/glusterfs_disks/disk02/brick
>> Brick6: 10.12.10.9:/var/lib/glusterfs_disks/disk02/brick
>> Options Reconfigured:
>> nfs.disable: on
>> network.ping-timeout: 10
>>
>> I run:
>> # glusterd -V
>> glusterfs 3.5.3 built on Nov 17 2014 15:48:52
>> Repository revision: git://git.gluster.com/glusterfs.git
>>
>>
>> Thank you for your time.
>>
>> Regards,
>> --
>> Rumen Telbizov
>> Unix Systems Administrator <http://telbizov.com>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
--
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150211/bcd70524/attachment.html>
More information about the Gluster-users
mailing list