[Gluster-users] How to check running transactions in gluster?
srakonde at redhat.com
Mon Nov 26 12:44:19 UTC 2018
You might be hitting https://bugzilla.redhat.com/show_bug.cgi?id=1635820
Were any of the volumes in "Created" state, when the peer reject issue is
On Mon, Nov 26, 2018 at 9:35 AM Jeevan Patnaik <g1patnaik at gmail.com> wrote:
> Hi Atin,
> Thanks for the details. I think the issue is with few of the nodes which
> aren't serving any bricks in rejected state. When I remove them from pool
> and stop glusterfs in those nodes, everything seems normal.
> We keep those nodes as spares, but have glusterd runnin. coz in our
> configuration, servers are also clients and we are using gluster NFS
> without failover for mounts and to localize the impact if a node goes down,
> we use localhost as the nfs server on each node.
> mount -t nfs localhost:/volume /mointpoint
> So, glusterfs should be running in these spare nodes. Now is this okay to
> keep those nodes in the pool? Will they go to rejected state again and
> cause transaction locks. Why aren't they in sync though they're part of the
> On Mon, Nov 26, 2018, 8:22 AM Atin Mukherjee <amukherj at redhat.com wrote:
>> On Mon, Nov 26, 2018 at 8:21 AM Atin Mukherjee <amukherj at redhat.com>
>>> On Sun, Nov 25, 2018 at 8:40 PM Jeevan Patnaik <g1patnaik at gmail.com>
>>>> I am getting output Another transaction is in progress with few gluster
>>>> volume commands including stop command. And with gluster volume status
>>>> command, it's just hung and fails with timeout error.
>>> This is primarily because of not allowing glusterd to complete it's
>>> handshake with others when concurrent restart of glusterd services are done
>>> (as I could understand from your previous email in the list). With GlusterD
>>> (read as GD1) this is a current challenge w.r.t it's design where due to
>>> its N X N handshaking mechanism at the restart sequence to bring all the
>>> configuration data into inconsistent what we've seen is the overall
>>> recovery time of the cluster can take very long if N is on the higher side
>>> (in your case N = 72 which is certainly high) and hence the recommendation
>>> is not to restart the glusterd services concurrently and wait for the
>>> handshaking to complete.
>> Forgot to mention that GlusterD2 ( https://github.com/gluster/glusterd2)
>> which is in development phase addresses this design problem.
>>>> So, I want to find out which transaction is hung and how to know this?
>>>> I ran volume statedump command, but didn't wait till it's completed to
>>>> check if it's hung or giving any resut, as it is also taking time.
>>> kill -SIGUSR1 $(pidof glusterd) should get you a glusterd statedump file
>>> in /var/run/gluster which can point to a backtrace dump at the bottom to
>>> understand which transaction is currently in progress. In case this
>>> transaction is queued up for more than 180 seconds (which is not usual) the
>>> unlock timer kicks out such locks.
>>>> Please help me with this.. I'm struggling with these gluster timeout
>>>> errors :(
>>>> Besides, I have also tuned
>>>> transport.listen-backlog gluster to 200 and following kernel parameters
>>>> to avoid syn overflow rejects:
>>>> net.core.somaxconn = 1024
>>>> net.ipv4.tcp_max_syn_backlog = 20480
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
> Gluster-users mailing list
> Gluster-users at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-users