[Gluster-devel] [Gluster-users] GlusterFS (3.3.1) - performance issues - large number of LOOKUP calls & high CPU usage
Song
gluster at 163.com
Fri Jun 21 03:53:41 UTC 2013
I use perf top and try to see where in the kernel code all this time was
being spend. Following is what I was seeing:
PerfTop: 2036 irqs/sec kernel:100.0% exact: 0.0% [1000Hz cycles],
(target_pid: 5336)
----------------------------------------------------------------------------
------------------------------
samples pcnt function DSO
_______ _____ _________________________ _________________
11265.00 91.0% _spin_lock_irq [kernel.kallsyms]
355.00 2.9% _spin_lock_irqsave [kernel.kallsyms]
249.00 2.0% compaction_alloc [kernel.kallsyms]
235.00 1.9% compact_zone [kernel.kallsyms]
151.00 1.2% get_pageblock_flags_group [kernel.kallsyms]
32.00 0.3% _cond_resched [kernel.kallsyms]
27.00 0.2% copy_page_c [kernel.kallsyms]
8.00 0.1% _spin_lock [kernel.kallsyms]
6.00 0.0% mem_cgroup_del_lru_list [kernel.kallsyms]
5.00 0.0% __wake_up_bit [kernel.kallsyms]
Then, I use “perf record -g -p 5336” to capture the percent of kernel
call and find “compact_zone” is very busy.
[root at bj-nx-cip-w87 ~]# perf report --stdio
# Events: 47K cycles
#
# Overhead Command Shared Object
Symbol
# ........ ......... .....................
......................................
#
91.51% glusterfs [kernel.kallsyms] [k] _spin_lock_irq
|
--- _spin_lock_irq
|
|--99.64%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __do_page_fault
| do_page_fault
| page_fault
| |
| |--94.45%-- xdr_callmsg_internal
| | 0x3829b98860
| |
| --5.55%-- __memcpy_sse2
--0.36%-- [...]
Last, I google “compact_zone” and find the article “Linux 6 Transparent
Huge Pages and Hadoop Workloads
<http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-had
oop-workloads/> ”. The sample issues is occurred in hadoop.
THP can be disabled by running the following command:
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
CPU usage and work load is normal.
From: gluster-devel-bounces+gluster=163.com at nongnu.org
[mailto:gluster-devel-bounces+gluster=163.com at nongnu.org] On Behalf Of Song
Sent: Friday, June 07, 2013 1:34 PM
To: 'Stephan von Krawczynski'; 'Pablo'
Cc: gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] [Gluster-users] GlusterFS (3.3.1) - performance
issues - large number of LOOKUP calls & high CPU usage
We have met same performance issues when open a file. Sometime, it cost more
than 10 seconds that a file open.
We add some debug info to locate this problem and test again and again, find
it's will cost a few seconds when execute 'xdr_callmsg' function in
'rpc_request_to_xdr'.
A typical example of log is as below. From [2013-06-06 13:34:59.471004] to
[2013-06-06 13:35:04.890363] , 'xdr_callmsg' function cost more than 5
seconds.
[2013-06-06 13:34:59.470991] I
[rpc-clnt.c:1175:rpc_clnt_record_build_record] 0-gfs1-client-51: (thread_id
is 140257410492160 )add for open_slow rpc_fill_request_end
[2013-06-06 13:34:59.471004] I [xdr-rpcclnt.c:87:rpc_request_to_xdr] 0-rpc:
(thread_id is 140257410492160 len = 131072 )add for open_slow
xdrmem_create_end
[2013-06-06 13:34:59.570044] I [client.c:124:client_submit_request]
0-gfs1-client-86: (thread_id is 140257819739904 )add for open_slow
rpc_clnt_submit
[2013-06-06 13:34:59.570091] I [rpc-clnt.c:1363:rpc_clnt_submit]
0-gfs1-client-86: (thread_id is 140257819739904 )add for open_slow callid
end
......
[2013-06-06 13:34:59.579865] I [client3_1-fops.c:2235:client3_1_lookup_cbk]
0-gfs1-client-5: (thread_id is 140257819739904)add for open_slow lookup_cbk
path=/xmail_dedup/gfs1_000/1FA/1B1
[2013-06-06 13:34:59.579917] I [client3_1-fops.c:2235:client3_1_lookup_cbk]
0-gfs1-client-6: (thread_id is 140257819739904)add for open_slow lookup_cbk
path=/xmail_dedup/gfs1_000/1FA/1B1
[2013-06-06 13:35:04.890363] I [xdr-rpcclnt.c:92:rpc_request_to_xdr] 0-rpc:
(thread_id is 140257410492160 )add for open_slow xdr_callmsg_end
[2013-06-06 13:35:04.890366] I [client.c:110:client_submit_request]
0-gfs1-client-44: (thread_id is 140257785079552 )add for open_slow create
the xdr payload
Native client and use 5 glusterfs in one server. When performance issues
appear, the cpu usage is as below:
top - 13:45:37 up 57 days, 14:04, 4 users, load average: 6.98, 5.38, 4.67
Tasks: 712 total, 8 running, 704 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.2%us, 63.5%sy, 0.0%ni, 31.5%id, 1.4%wa, 0.0%hi, 0.4%si,
0.0%st
Mem: 65956748k total, 55218008k used, 10738740k free, 3362972k buffers
Swap: 8388600k total, 41448k used, 8347152k free, 37370840k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13905 root 20 0 554m 363m 2172 R 244.8 0.6 9:51.01 glusterfs
13650 root 20 0 766m 610m 2056 R 184.8 0.9 18:24.37 glusterfs
13898 root 20 0 545m 356m 2176 R 179.2 0.6 12:04.87 glusterfs
13919 root 20 0 547m 360m 2172 R 111.6 0.6 9:16.89 glusterfs
22460 root 20 0 486m 296m 2200 S 100.4 0.5 194:59.10 glusterfs
13878 root 20 0 545m 361m 2176 R 99.7 0.6 8:35.88 glusterfs
-----Original Message-----
From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Stephan von
Krawczynski
Sent: Thursday, June 06, 2013 10:07 PM
To: Pablo
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] GlusterFS (3.3.1) - performance issues - large
number of LOOKUP calls & high CPU usage
On Thu, 06 Jun 2013 10:39:21 -0300
Pablo < <mailto:paa.listas at gmail.com> paa.listas at gmail.com> wrote:
> I have never try this (In fact I'm just learning a bit more how to
> administer a Gluster server.), buy you may find it useful.
>
>
<http://download.gluster.org/pub/gluster/glusterfs/doc/HA%20and%20Load%25>
http://download.gluster.org/pub/gluster/glusterfs/doc/HA%20and%20Load%
> 20Balancing%20for%20NFS%20and%20SMB.html
>
> Pablo.
The thing with this way of failover is though, that you will likely corrupt
a currently written file. If your NFS-server (gluster) node dies while you
write your file will be corrupt. If you use native glusterfs mounts it will
not (should not). This is why I consider the NFS server feature nothing more
than a bad hack. It does not deliver the safety that glusterfs promises,
even if you solve the failover problem somehow.
--
Regards,
Stephan
_______________________________________________
Gluster-users mailing list
<mailto:Gluster-users at gluster.org> Gluster-users at gluster.org
<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130621/2fecf4d6/attachment-0001.html>
More information about the Gluster-devel
mailing list