[Gluster-users] gluster becomes too slow, need frequent stop-start or reboot

Tue Jun 26 06:41:40 UTC 2018

On Mon, Jun 25, 2018 at 10:01 AM Anh Vo <vtqanh at gmail.com> wrote:

> Anyone able to help us troubleshoot this issue? This is getting worse. We
> are back to our 3-replica setup but the issue is still happening. What we
> have found is that if I just bring one set of bricks offline. For example
> if I have (0 1 2) (3 4 5) (6 7 8) (9 10 11) and if I take the bricks 0 3 6
> 9, or bricks 1 4 7 10 offline then performance is super fast. The moment
> all bricks are online things become very slow. It seems like gluster is
> having some sort of lock contention between its members. During the period
> of slowness gluster profile would show excessive time spent in LOOKUP,
> FINODELK
>

Have you checked if a self-heal is in progress to resync data after the
bricks are all online? Healing can impact performance of user applications
owing to contention and once the system reaches a steady state, the
performance should improve.

>
>      11.60     752.64 us      10.00 us 2647757.00 us      272476323
> LOOKUP
>      15.83    6884.12 us      29.00 us 2190470.00 us       40626259
>  WRITE
>      27.84   80480.22 us      40.00 us 11731910.00 us        6114072
> FXATTROP
>      37.83  105125.18 us      12.00 us 276088722.00 us        6359515
> FINODELK
>
> We have about one or two months before we need to make a decision to keep
> Gluster and so far it has been a lot of headache.
>

Detailed bug reports, RFEs in github and/or patches that can help Gluster
work better for your use case are welcome!

Thanks,
Vijay

> On Thu, Jun 14, 2018 at 10:18 AM, Anh Vo <vtqanh at gmail.com> wrote:
>
>> Our gluster keeps getting to a state where it becomes painfully slow and
>> many of our applications time out on read/write call. When this happens a
>> simple ls at top level directory from the mount takes somewhere between
>> 8-25s (normally it is very fast, at most 1-2s). The top level directory
>> only has about 10 folders.
>>
>> The two methods to mitigate this problem have been 1) restart all GFS
>> servers or 2) stop/start the volume. 2) does take somewhere between half an
>> hour to an hour for gluster to get back to its desired performance.
>>
>> So far the logs don't show anything unusual but perhaps I don't know what
>> I should be looking for in the logs. Even when gluster are fully functional
>> we see lots of logs, hard to tell which error is harmless and what is not.
>>
>> This issue does not seem to happen with our 3 replica glusters, only with
>> 2-replica-1-arbiter and 2-replica. However, our 3-replica glusters are only
>> 30% full while the 2-replica ones are about 80% full.
>> We're running 3.12.9 for the servers. The clients are 3.8.15, but we
>> notice the slowness of operations on 3.12.9 clients as well.
>>
>> Configuration: 12 GFS servers, one brick per server, replica 2, 80T each
>> brick. We used to have arbiters but thought the arbiters were causing the
>> slow down so we took them out. Apparently it's not the case.
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180625/d8db8073/attachment.html>