[Gluster-users] Gluster High CPU/Clients Hanging on Heavy Writes

Sun Aug 5 13:39:27 UTC 2018

On Sun, 5 Aug 2018 at 13:29, Yuhao Zhang <zzyzxd at gmail.com> wrote:

> Sorry, what I meant was, if I start the transfer now and get glusterd into
> zombie status,
>

glusterd or glusterfsd?

it's unlikely that I can fully recover the server without a reboot.
>
>
> On Aug 5, 2018, at 02:55, Raghavendra Gowdappa <rgowdapp at redhat.com>
> wrote:
>
>
>
> On Sun, Aug 5, 2018 at 1:22 PM, Yuhao Zhang <zzyzxd at gmail.com> wrote:
>
>> This is a semi-production server and I can't bring it down right now.
>> Will try to get the monitoring output when I get a chance.
>>
>
> Collecting top output doesn't require to bring down servers.
>
>
>> As I recall, the high CPU processes are brick daemons (glusterfsd) and
>> htop showed they were in status D. However, I saw zero zpool IO as clients
>> were all hanging.
>>
>>
>> On Aug 5, 2018, at 02:38, Raghavendra Gowdappa <rgowdapp at redhat.com>
>> wrote:
>>
>>
>>
>> On Sun, Aug 5, 2018 at 12:44 PM, Yuhao Zhang <zzyzxd at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running into a situation that heavy write causes Gluster server
>>> went into zombie with many high CPU processes and all clients hangs, it is
>>> almost 100% reproducible on my machine. Hope someone can help.
>>>
>>
>> Can you give us the output of monitioring these processes with High cpu
>> usage captured in the duration when your tests are running?
>>
>>
>>    - MON_INTERVAL=10 # can be increased for very long runs
>>    - top -bd $MON_INTERVAL > /tmp/top_proc.${HOSTNAME}.txt # CPU
>>    utilization by process
>>    - top -bHd $MON_INTERVAL > /tmp/top_thr.${HOSTNAME}.txt # CPU
>>    utilization by thread
>>
>>
>>
>>> I started to observe this issue when running rsync to copy files from
>>> another server and I thought it might be because Gluster doesn't like
>>> rsync's delta transfer with a lot of small writes. However, I was able to
>>> reproduce this with "rsync --whole-file --inplace", or even with cp or scp.
>>> It usually appears after starting the transfer for a few hours, but
>>> sometimes can happen within several minutes.
>>>
>>> Since this is a single node Gluster distributed volume, I tried to
>>> transfer files directly onto the server bypassing Gluster clients, but it
>>> still caused the same issue.
>>>
>>> It is running on top of a ZFS RAIDZ2 dataset. Options are attached.
>>> Also, I attached the statedump generated when my clients hung, and volume
>>> options.
>>>
>>> - Ubuntu 16.04 x86_64 / 4.4.0-116-generic
>>> - GlusterFS 3.12.8
>>>
>>> Thank you,
>>> Yuhao
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
--Atin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180805/d1bdf22e/attachment.html>