[Gluster-users] How to check running transactions in gluster?

Mon Nov 26 12:44:19 UTC 2018

Hi Jeevan,

You might be hitting https://bugzilla.redhat.com/show_bug.cgi?id=1635820

Were any of the volumes in "Created" state, when the peer reject issue is
seen?

Thanks,
Sanju

On Mon, Nov 26, 2018 at 9:35 AM Jeevan Patnaik <g1patnaik at gmail.com> wrote:

> Hi Atin,
>
> Thanks for the details. I think the issue is with few of the nodes which
> aren't serving any bricks in rejected state. When I remove them from pool
> and stop glusterfs in those nodes,  everything seems normal.
>
> We keep those nodes as spares, but have glusterd runnin. coz in our
> configuration, servers are also clients and we are using gluster NFS
> without failover for mounts and to localize the impact if a node goes down,
> we use localhost as the nfs server on each node.
> I.e.,
> mount -t nfs localhost:/volume /mointpoint
>
> So, glusterfs should be running in these spare nodes. Now is this okay to
> keep those nodes in the pool? Will they go to rejected state again and
> cause transaction locks. Why aren't they in sync though they're part of the
> pool.
>
> Regards,
> Jeevan.
>
> On Mon, Nov 26, 2018, 8:22 AM Atin Mukherjee <amukherj at redhat.com wrote:
>
>>
>>
>> On Mon, Nov 26, 2018 at 8:21 AM Atin Mukherjee <amukherj at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Sun, Nov 25, 2018 at 8:40 PM Jeevan Patnaik <g1patnaik at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am getting output Another transaction is in progress with few gluster
>>>> volume commands including stop command. And with gluster volume status
>>>> command, it's just hung and fails with timeout error.
>>>>
>>>
>>> This is primarily because of not allowing glusterd to complete it's
>>> handshake with others when concurrent restart of glusterd services are done
>>> (as I could understand from your previous email in the list). With GlusterD
>>> (read as GD1) this is a current challenge w.r.t it's design where due to
>>> its N X N handshaking mechanism at the restart sequence to bring all the
>>> configuration data into inconsistent what we've seen is the overall
>>> recovery time of the cluster can take very long if N is on the higher side
>>> (in your case N = 72 which is certainly high) and hence the recommendation
>>> is not to restart the glusterd services concurrently and wait for the
>>> handshaking to complete.
>>>
>>
>> Forgot to mention that GlusterD2 ( https://github.com/gluster/glusterd2)
>> which is in development phase addresses this design problem.
>>
>>
>>>
>>>> So, I want to find out which transaction is hung and how to know this?
>>>> I ran volume statedump command, but didn't wait till it's completed to
>>>> check if it's hung or giving any resut, as it is also taking time.
>>>>
>>>
>>> kill -SIGUSR1 $(pidof glusterd) should get you a glusterd statedump file
>>> in /var/run/gluster which can point to a backtrace dump at the bottom to
>>> understand which transaction is currently in progress. In case this
>>> transaction is queued up for more than 180 seconds (which is not usual) the
>>> unlock timer kicks out such locks.
>>>
>>>
>>>> Please help me with this.. I'm struggling with these gluster timeout
>>>> errors :(
>>>>
>>>> Besides, I have also tuned
>>>> transport.listen-backlog gluster to 200 and following kernel parameters
>>>> to avoid syn overflow rejects:
>>>> net.core.somaxconn = 1024
>>>> net.ipv4.tcp_max_syn_backlog = 20480
>>>>
>>>> Regards,
>>>> Jeevan.
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181126/b95c111a/attachment.html>