[Gluster-users] Poor performance on a server-class system vs. desktop

Xavi Hernandez jahernan at redhat.com
Fri Nov 27 08:37:51 UTC 2020


Hi Dmitry,

On Thu, Nov 26, 2020 at 10:44 AM Dmitry Antipov <dmantipov at yandex.ru> wrote:

> BTW, did someone try to profile the brick process? I do, and got this
> for the default replica 3 volume ('perf record -F 2500 -g -p [PID]'):
>
> +    3.29%     0.02%  glfs_epoll001    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    3.17%     0.01%  glfs_epoll001    [kernel.kallsyms]      [k]
> do_syscall_64
> +    3.17%     0.02%  glfs_epoll000    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    3.06%     0.02%  glfs_epoll000    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.75%     0.01%  glfs_iotwr00f    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.74%     0.01%  glfs_iotwr00b    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.74%     0.01%  glfs_iotwr001    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.73%     0.00%  glfs_iotwr003    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.72%     0.00%  glfs_iotwr000    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.72%     0.01%  glfs_iotwr00c    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.70%     0.01%  glfs_iotwr003    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.69%     0.00%  glfs_iotwr001    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.69%     0.01%  glfs_iotwr008    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.68%     0.00%  glfs_iotwr00b    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.68%     0.00%  glfs_iotwr00c    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.68%     0.00%  glfs_iotwr00f    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.68%     0.01%  glfs_iotwr000    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.67%     0.00%  glfs_iotwr00a    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.65%     0.00%  glfs_iotwr008    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.64%     0.00%  glfs_iotwr00e    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.64%     0.01%  glfs_iotwr00d    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.63%     0.01%  glfs_iotwr00a    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.63%     0.01%  glfs_iotwr007    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.63%     0.00%  glfs_iotwr005    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.63%     0.01%  glfs_iotwr006    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.63%     0.00%  glfs_iotwr009    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.61%     0.01%  glfs_iotwr004    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.61%     0.01%  glfs_iotwr00e    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.60%     0.00%  glfs_iotwr006    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.59%     0.00%  glfs_iotwr005    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.59%     0.00%  glfs_iotwr00d    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.58%     0.00%  glfs_iotwr002    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.58%     0.01%  glfs_iotwr007    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.58%     0.00%  glfs_iotwr004    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.57%     0.00%  glfs_iotwr009    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.54%     0.00%  glfs_iotwr002    [kernel.kallsyms]      [k]
> do_syscall_64
> +    1.65%     0.00%  glfs_epoll000    [unknown]              [k]
> 0x0000000000000001
> +    1.65%     0.00%  glfs_epoll001    [unknown]              [k]
> 0x0000000000000001
> +    1.48%     0.01%  glfs_rpcrqhnd    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    1.44%     0.08%  glfs_rpcrqhnd    libpthread-2.32.so     [.]
> pthread_cond_wait@@GLIBC_2.3.2
> +    1.40%     0.01%  glfs_rpcrqhnd    [kernel.kallsyms]      [k]
> do_syscall_64
> +    1.36%     0.01%  glfs_rpcrqhnd    [kernel.kallsyms]      [k]
> __x64_sys_futex
> +    1.35%     0.03%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] do_futex
> +    1.34%     0.01%  glfs_iotwr00a    libpthread-2.32.so     [.]
> __libc_pwrite64
> +    1.32%     0.00%  glfs_iotwr00a    [kernel.kallsyms]      [k]
> __x64_sys_pwrite64
> +    1.32%     0.00%  glfs_iotwr001    libpthread-2.32.so     [.]
> __libc_pwrite64
> +    1.31%     0.01%  glfs_iotwr002    libpthread-2.32.so     [.]
> __libc_pwrite64
> +    1.31%     0.00%  glfs_iotwr00b    libpthread-2.32.so     [.]
> __libc_pwrite64
> +    1.31%     0.01%  glfs_iotwr00a    [kernel.kallsyms]      [k] vfs_write
> +    1.30%     0.00%  glfs_iotwr001    [kernel.kallsyms]      [k]
> __x64_sys_pwrite64
> +    1.30%     0.00%  glfs_iotwr008    libpthread-2.32.so     [.]
> __libc_pwrite64
> +    1.30%     0.00%  glfs_iotwr00a    [kernel.kallsyms]      [k]
> new_sync_write
> +    1.30%     0.00%  glfs_iotwr00c    libpthread-2.32.so     [.]
> __libc_pwrite64
> +    1.29%     0.00%  glfs_iotwr00a    [kernel.kallsyms]      [k]
> xfs_file_write_iter
> +    1.29%     0.01%  glfs_iotwr00a    [kernel.kallsyms]      [k]
> xfs_file_dio_aio_write
>
> And on replica 3 with storage.linux-aio enabled:
>
> +   11.76%     0.05%  glfs_posixaio    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +   11.42%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k]
> do_syscall_64
> +    8.81%     0.00%  glfs_posixaio    [unknown]              [k]
> 0x00000000baadf00d
> +    8.81%     0.00%  glfs_posixaio    [unknown]              [k]
> 0x0000000000000004
> +    8.74%     0.06%  glfs_posixaio    libc-2.32.so           [.]
> __GI___writev
> +    8.33%     0.02%  glfs_posixaio    [kernel.kallsyms]      [k] do_writev
> +    8.23%     0.03%  glfs_posixaio    [kernel.kallsyms]      [k]
> vfs_writev
> +    8.12%     0.05%  glfs_posixaio    [kernel.kallsyms]      [k]
> do_iter_write
> +    8.02%     0.05%  glfs_posixaio    [kernel.kallsyms]      [k]
> do_iter_readv_writev
> +    7.96%     0.04%  glfs_posixaio    [kernel.kallsyms]      [k]
> sock_write_iter
> +    7.92%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k]
> sock_sendmsg
> +    7.86%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k]
> tcp_sendmsg
> +    7.28%     0.15%  glfs_posixaio    [kernel.kallsyms]      [k]
> tcp_sendmsg_locked
> +    6.49%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k]
> __tcp_push_pending_frames
> +    6.48%     0.10%  glfs_posixaio    [kernel.kallsyms]      [k]
> tcp_write_xmit
> +    6.31%     0.02%  glfs_posixaio    [unknown]              [k]
> 0000000000000000
> +    6.05%     0.13%  glfs_posixaio    [kernel.kallsyms]      [k]
> __tcp_transmit_skb
> +    5.71%     0.06%  glfs_posixaio    [kernel.kallsyms]      [k]
> __ip_queue_xmit
> +    4.15%     0.03%  glfs_rpcrqhnd    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    4.07%     0.08%  glfs_posixaio    [kernel.kallsyms]      [k]
> ip_finish_output2
> +    3.75%     0.02%  glfs_posixaio    [kernel.kallsyms]      [k]
> asm_call_sysvec_on_stack
> +    3.75%     0.01%  glfs_rpcrqhnd    [kernel.kallsyms]      [k]
> do_syscall_64
> +    3.70%     0.03%  glfs_rpcrqhnd    [kernel.kallsyms]      [k]
> __x64_sys_futex
> +    3.68%     0.06%  glfs_posixaio    [kernel.kallsyms]      [k]
> __local_bh_enable_ip
> +    3.67%     0.07%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] do_futex
> +    3.62%     0.05%  glfs_posixaio    [kernel.kallsyms]      [k]
> do_softirq
> +    3.61%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k]
> do_softirq_own_stack
> +    3.59%     0.06%  glfs_posixaio    [kernel.kallsyms]      [k]
> __softirqentry_text_start
> +    3.44%     0.06%  glfs_posixaio    [kernel.kallsyms]      [k]
> net_rx_action
> +    3.34%     0.04%  glfs_posixaio    [kernel.kallsyms]      [k]
> process_backlog
> +    3.28%     0.02%  glfs_posixaio    [kernel.kallsyms]      [k]
> __netif_receive_skb_one_core
> +    3.08%     0.02%  glfs_epoll000    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    3.02%     0.03%  glfs_epoll001    [kernel.kallsyms]      [k]
> entry_SYSCALL_64_after_hwframe
> +    2.97%     0.01%  glfs_epoll000    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.89%     0.01%  glfs_epoll001    [kernel.kallsyms]      [k]
> do_syscall_64
> +    2.73%     0.08%  glfs_posixaio    [kernel.kallsyms]      [k]
> nf_hook_slow
> +    2.25%     0.04%  glfs_posixaio    libc-2.32.so           [.]
> fgetxattr
> +    2.16%     0.14%  glfs_rpcrqhnd    [kernel.kallsyms]      [k]
> futex_wake
>
> According to these tables, the brick process is just a thin wrapper for
> the system calls
> and kernel network subsystem behind them.


Mostly. However there's one issue that doesn't seem so obvious in the perf
capture but we have identified it in other setups: when the system calls
are processed very fast (as it should be the case when NVMe is used), the
io-threads' thread pool will be constantly processing the request queue.
This queue is currently synchronized with a mutex. The small latency per
request makes the contention on the mutex quite high. This means that the
thread pool tends to be serialized by the lock, which kills most of the
parallelism and also causes a lot of additional system calls (increased CPU
utilization and higher latencies).

For now the only way I know to try to minimize this effect is to reduce the
number of threads in the io-threads pool. It's hard to tell what would be a
good number. It depends on many things. But you can run some tests with
different values to try to find the best one (after changing the number of
threads, it's better to restart the volume).

Reducing the number of threads reduces the CPU power that gluster can use,
but also reduces the contention, so it's expected (though not guaranteed)
that at some point, even with fewer threads the performance could be a bit
better.

Regards,

Xavi




> To whom it may be interesting, the following replica 3 volume options:
>
> performance.io-cache-pass-through: on
> performance.iot-pass-through: on
> performance.md-cache-pass-through: on
> performance.nl-cache-pass-through: on
> performance.open-behind-pass-through: on
> performance.read-ahead-pass-through: on
> performance.readdir-ahead-pass-through: on
> performance.strict-o-direct: on
> features.ctime: off
> features.selinux: off
> performance.write-behind: off
> performance.open-behind: off
> performance.quick-read: off
> storage.linux-aio: on
> storage.fips-mode-rchecksum: off
>
> are likely to improve the I/O performance of GFAPI clients (fio with gfapi
> and gfapi_async
> engines, qemu -drive file=gluster://XXX, etc.) by ~20%. But beware of
> killing I/O performance
> of FUSE clients.
>
> Dmitry
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201127/a7d3daf1/attachment.html>


More information about the Gluster-users mailing list