<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 4, 2019 at 7:47 PM Raghavendra Gowdappa <<a href="mailto:rgowdapp@redhat.com">rgowdapp@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 4, 2019 at 4:26 PM Hu Bert <<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Raghavendra,<br>
<br>
at the moment iowait and cpu consumption is quite low, the main<br>
problems appear during the weekend (high traffic, especially on<br>
sunday), so either we have to wait until next sunday or use a time<br>
machine ;-)<br>
<br>
I made a screenshot of top (<a href="https://abload.de/img/top-hvvjt2.jpg" rel="noreferrer" target="_blank">https://abload.de/img/top-hvvjt2.jpg</a>) and<br>
a text output (<a href="https://pastebin.com/TkTWnqxt" rel="noreferrer" target="_blank">https://pastebin.com/TkTWnqxt</a>), maybe that helps. Seems<br>
like processes like glfs_fuseproc (>204h) and glfs_epoll (64h for each<br>
process) consume a lot of CPU (uptime 24 days). Is that already<br>
helpful?<br></blockquote><div><br></div><div>Not much. The TIME field just says the amount of time the thread has been executing. Since its a long standing mount, we can expect such large values. But, the value itself doesn't indicate whether the thread itself was overloaded at any (some) interval(s).</div><div><br></div><div>Can you please collect output of following command and send back the collected data?</div><div><br></div><div># top -bHd 3 > top.output</div></div></div></blockquote><div><br></div><div>Please collect this on problematic mounts and bricks.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
Hubert<br>
<br>
Am Mo., 4. März 2019 um 11:31 Uhr schrieb Raghavendra Gowdappa<br>
<<a href="mailto:rgowdapp@redhat.com" target="_blank">rgowdapp@redhat.com</a>>:<br>
><br>
> what is the per thread CPU usage like on these clients? With highly concurrent workloads we've seen single thread that reads requests from /dev/fuse (fuse reader thread) becoming bottleneck. Would like to know what is the cpu usage of this thread looks like (you can use top -H).<br>
><br>
> On Mon, Mar 4, 2019 at 3:39 PM Hu Bert <<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>> wrote:<br>
>><br>
>> Good morning,<br>
>><br>
>> we use gluster v5.3 (replicate with 3 servers, 2 volumes, raid10 as<br>
>> brick) with at the moment 10 clients; 3 of them do heavy I/O<br>
>> operations (apache tomcats, read+write of (small) images). These 3<br>
>> clients have a quite high I/O wait (stats from yesterday) as can be<br>
>> seen here:<br>
>><br>
>> client: <a href="https://abload.de/img/client1-cpu-dayulkza.png" rel="noreferrer" target="_blank">https://abload.de/img/client1-cpu-dayulkza.png</a><br>
>> server: <a href="https://abload.de/img/server1-cpu-dayayjdq.png" rel="noreferrer" target="_blank">https://abload.de/img/server1-cpu-dayayjdq.png</a><br>
>><br>
>> The iowait in the graphics differ a lot. I checked netstat for the<br>
>> different clients; the other clients have 8 open connections:<br>
>> <a href="https://pastebin.com/bSN5fXwc" rel="noreferrer" target="_blank">https://pastebin.com/bSN5fXwc</a><br>
>><br>
>> 4 for each server and each volume. The 3 clients with the heavy I/O<br>
>> have (at the moment) according to netstat 170, 139 and 153<br>
>> connections. An example for one client can be found here:<br>
>> <a href="https://pastebin.com/2zfWXASZ" rel="noreferrer" target="_blank">https://pastebin.com/2zfWXASZ</a><br>
>><br>
>> gluster volume info: <a href="https://pastebin.com/13LXPhmd" rel="noreferrer" target="_blank">https://pastebin.com/13LXPhmd</a><br>
>> gluster volume status: <a href="https://pastebin.com/cYFnWjUJ" rel="noreferrer" target="_blank">https://pastebin.com/cYFnWjUJ</a><br>
>><br>
>> I just was wondering if the iowait is based on the clients and their<br>
>> workflow: requesting a lot of files (up to hundreds per second),<br>
>> opening a lot of connections and the servers aren't able to answer<br>
>> properly. Maybe something can be tuned here?<br>
>><br>
>> Especially the server|client.event-threads (both set to 4) and<br>
>> performance.(high|normal|low|least)-prio-threads (all at default value<br>
>> 16) and performance.io-thread-count (32) options, maybe these aren't<br>
>> properly configured for up to 170 client connections.<br>
>><br>
>> Both servers and clients have a Xeon CPU (6 cores, 12 threads), a 10<br>
>> GBit connection and 128G (servers) respectively 256G (clients) RAM.<br>
>> Enough power :-)<br>
>><br>
>><br>
>> Thx for reading && best regards,<br>
>><br>
>> Hubert<br>
>> _______________________________________________<br>
>> Gluster-users mailing list<br>
>> <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
>> <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div></div>
</blockquote></div></div>