[Gluster-users] never ending logging

Mon Apr 27 08:58:47 UTC 2020

@Nikhil Ladha <nladha at redhat.com> Please look at this issue.

On Thu, Apr 23, 2020 at 9:02 PM <nico at furyweb.fr> wrote:

> Did you find any clue in log files ?
>
> I can try an update to 7.5 in case some recent bug were solved, what's
> your opinion ?
>
> ------------------------------
> *De: *"Sanju Rakonde" <srakonde at redhat.com>
> *À: *nico at furyweb.fr
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Envoyé: *Mercredi 22 Avril 2020 08:39:42
> *Objet: *Re: [Gluster-users] never ending logging
>
> Thanks for all the information.
> For pstack output, gluster-debuginfo package has to be installed. I will
> check out the provided information and get back to you.
>
> On Wed, Apr 22, 2020 at 11:54 AM <nico at furyweb.fr> wrote:
>
>> I think all issues are linked with the same underlying problem.
>>
>> 1. all peers were in Connected state from every node yesterday but node 2
>> is "semi-connected" at now:
>> root at glusterDevVM1:/var/log/glusterfs# gluster peer status
>> Number of Peers: 2
>>
>> Hostname: glusterDevVM3
>> Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974
>> State: Peer in Cluster (Connected)
>>
>> Hostname: glusterDevVM2
>> Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18
>> State: Peer in Cluster (Connected)
>> root at glusterDevVM2:~# gluster peer status
>> Number of Peers: 2
>>
>> Hostname: glusterDevVM1
>> Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb
>> State: Peer in Cluster (Disconnected)
>>
>> Hostname: glusterDevVM3
>> Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974
>> State: Peer in Cluster (Connected)
>> root at glusterDevVM3:~# gluster peer status
>> Number of Peers: 2
>>
>> Hostname: glusterDevVM2
>> Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18
>> State: Peer in Cluster (Connected)
>>
>> Hostname: glusterDevVM1
>> Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb
>> State: Peer in Cluster (Connected)
>> 2, 3, 4. a simple gluster volume status show multiple errors on each node:
>> root at glusterDevVM1:~# gluster volume status tmp
>> Locking failed on glusterDevVM2. Please check log file for details.
>> root at glusterDevVM2:~# gluster volume status tmp
>> Another transaction is in progress for tmp. Please try again after some
>> time.
>> root at glusterDevVM3:~# gluster volume status tmp
>> Error : Request timed out
>>
>> Logs for each node (except SSL errors):
>> root at glusterDevVM1:~# egrep -v '0-socket.management'
>> /var/log/glusterfs/glusterd.log
>> [2020-04-22 05:38:32.278618] E [rpc-clnt.c:346:saved_frames_unwind] (-->
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7fd28d99fda8]
>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7fd28d745d97] (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7fd28d745ebe] (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7fd28d746e93]
>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7fd28d747a18] )))))
>> 0-management: forced unwinding frame type(Peer mgmt) op(--(4)) called at
>> 2020-04-22 05:38:32.243087 (xid=0x8d)
>> [2020-04-22 05:38:32.278638] E [MSGID: 106157]
>> [glusterd-rpc-ops.c:665:__glusterd_friend_update_cbk] 0-management: RPC
>> Error
>> [2020-04-22 05:38:32.278651] I [MSGID: 106493]
>> [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management:
>> Received RJT from uuid: 00000000-0000-0000-0000-000000000000
>> [2020-04-22 05:38:33.256401] I [MSGID: 106498]
>> [glusterd-svc-helper.c:747:__glusterd_send_svc_configure_req] 0-management:
>> not connected yet
>> [2020-04-22 05:38:35.279149] I
>> [socket.c:4347:ssl_setup_connection_params] 0-management: SSL support on
>> the I/O path is ENABLED
>> [2020-04-22 05:38:35.279169] I
>> [socket.c:4350:ssl_setup_connection_params] 0-management: SSL support for
>> glusterd is ENABLED
>> [2020-04-22 05:38:35.279178] I
>> [socket.c:4360:ssl_setup_connection_params] 0-management: using certificate
>> depth 1
>> The message "I [MSGID: 106004]
>> [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer
>> <glusterDevVM2> (<7f6c3023-144b-4db2-9063-d90926dbdd18>), in state <Peer in
>> Cluster>, has disconnected from glusterd." repeated 3 times between
>> [2020-04-22 05:38:25.232116] and [2020-04-22 05:38:35.667153]
>> [2020-04-22 05:38:35.667255] W
>> [glusterd-locks.c:796:glusterd_mgmt_v3_unlock]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119)
>> [0x7fd287da7119]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae)
>> [0x7fd287db1aae]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3)
>> [0x7fd287e640d3] ) 0-management: Lock for vol <vol> not held
>> [2020-04-22 05:38:35.667275] W [MSGID: 106117]
>> [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not
>> released for <vol>
>> 2 last lines repeated for each volume
>>
>> root at glusterDevVM2:~# egrep -v '0-socket.management'
>> /var/log/glusterfs/glusterd.log
>> [2020-04-22 05:51:57.493574] E [rpc-clnt.c:346:saved_frames_unwind] (-->
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7f30411dbda8]
>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7f3040f81d97] (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7f3040f81ebe] (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7f3040f82e93]
>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7f3040f83a18] )))))
>> 0-management: forced unwinding frame type(Gluster MGMT Handshake)
>> op(MGMT-VERS(1)) called at 2020-04-22 05:51:57.483579 (xid=0x563)
>> [2020-04-22 05:51:57.493623] E [MSGID: 106167]
>> [glusterd-handshake.c:2040:__glusterd_mgmt_hndsk_version_cbk] 0-management:
>> Error through RPC layer, retry again later
>> [2020-04-22 05:52:00.501474] I
>> [socket.c:4347:ssl_setup_connection_params] 0-management: SSL support on
>> the I/O path is ENABLED
>> [2020-04-22 05:52:00.501542] I
>> [socket.c:4350:ssl_setup_connection_params] 0-management: SSL support for
>> glusterd is ENABLED
>> [2020-04-22 05:52:00.501569] I
>> [socket.c:4360:ssl_setup_connection_params] 0-management: using certificate
>> depth 1
>> [2020-04-22 05:52:00.983720] I [MSGID: 106004]
>> [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer
>> <glusterDevVM1> (<e2263e4d-a307-45d5-9cec-e1791f7a45fb>), in state <Peer in
>> Cluster>, has disconnected from glusterd.
>> [2020-04-22 05:52:00.983886] W
>> [glusterd-locks.c:796:glusterd_mgmt_v3_unlock]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119)
>> [0x7f303b5e3119]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae)
>> [0x7f303b5edaae]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3)
>> [0x7f303b6a00d3] ) 0-management: Lock for vol <vol> not held
>> [2020-04-22 05:52:00.983909] W [MSGID: 106117]
>> [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not
>> released for <vol>
>> 2 last lines repeated for each volume
>>
>> root at glusterDevVM3:~# egrep -v '0-socket.management'
>> /var/log/glusterfs/glusterd.log
>> [2020-04-22 05:38:33.229959] I [MSGID: 106499]
>> [glusterd-handler.c:4264:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume tmp
>> [2020-04-22 05:41:33.230170] I
>> [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: unlock
>> timer is cancelled for volume_type tmp_vol
>> [2020-04-22 05:48:34.908289] E [rpc-clnt.c:183:call_bail] 0-management:
>> bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x108, unique =
>> 918, sent = 2020-04-22 05:38:33.230268, timeout = 600 for 10.5.1.7:24007
>> [2020-04-22 05:48:34.908339] E [MSGID: 106115]
>> [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking
>> failed on glusterDevVM1. Please check log file for details.
>> [2020-04-22 05:48:40.288539] E [rpc-clnt.c:183:call_bail] 0-management:
>> bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x27, unique =
>> 917, sent = 2020-04-22 05:38:33.230258, timeout = 600 for 10.5.1.8:24007
>> [2020-04-22 05:48:40.288568] E [MSGID: 106115]
>> [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking
>> failed on glusterDevVM2. Please check log file for details.
>> [2020-04-22 05:48:40.288631] E [MSGID: 106150]
>> [glusterd-syncop.c:1918:gd_sync_task_begin] 0-management: Locking Peers
>> Failed.
>>
>> I'm not familiar with pstack, when running on node 3 (arbiter) I get only
>> these few lines :
>> root at glusterDevVM3:~# pstack 13700
>>
>> 13700: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>> (No symbols found)
>> 0x7fafd747a6cd: ????
>>
>> Which debian stretch package should I install ?
>>
>> To be more explicit, I stopped glusterd on all 3 nodes then restart
>> sequentially with this order : node1, node3 (arbiter) then node2.
>> Log files can be dl at
>> https://www.dropbox.com/s/rcgcw7jrud2wkj1/glusterd-logs.tar.bz2?dl=0
>>
>> Thanks for your help.
>>
>> ------------------------------
>> *De: *"Sanju Rakonde" <srakonde at redhat.com>
>> *À: *nico at furyweb.fr
>> *Cc: *"gluster-users" <gluster-users at gluster.org>
>> *Envoyé: *Mercredi 22 Avril 2020 07:23:52
>> *Objet: *Re: [Gluster-users] never ending logging
>>
>> Hi,
>> The email is talking about many issues. Let me ask a few questions to get
>> a whole picture.
>> 1. are the peers are in the connected state now? or they still in the
>> rejected state?
>> 2. What led you to see "locking failed" messages? We would like to if
>> there is a reproducer and fix the issue if any.
>> 3. Another transaction in progress message appears when there is already
>> a operation going on. Are you seeing this when there is no such transaction
>> going on?
>> 4. When did you hit the timedouts? Did you tried to look at the pstack
>> output of glusterd process? If so, please share the pstack output.
>>
>> On Tue, Apr 21, 2020 at 7:08 PM <nico at furyweb.fr> wrote:
>>
>>> Hi all.
>>>
>>> We're using 3 nodes Gluster 7.3 (2 + 1 arbiter), yesterday node 2 was
>>> rejected from cluster and I applied following steps to fix :
>>> https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/
>>> I also saw
>>> https://docs.gluster.org/en/latest/Troubleshooting/troubleshooting-glusterd/
>>> but solution isn't compatible as cluster.max-op-version doesn't exist and
>>> all op-version are the same on all 3 nodes.
>>>
>>> After renewing SSL certs and several restart all volumes came back
>>> online but glusterd log file on all 3 nodes is filled with nothing else
>>> than following 3 lines :
>>>
>>> [2020-04-21 13:05:19.478913] I
>>> [socket.c:4347:ssl_setup_connection_params] 0-socket.management: SSL
>>> support on the I/O path is ENABLED
>>> [2020-04-21 13:05:19.478972] I
>>> [socket.c:4350:ssl_setup_connection_params] 0-socket.management: SSL
>>> support for glusterd is ENABLED
>>> [2020-04-21 13:05:19.478986] I
>>> [socket.c:4360:ssl_setup_connection_params] 0-socket.management: using
>>> certificate depth 1
>>>
>>> Moreover, I have "Locking failed", "Another transaction is in progress"
>>> and "Error : Request timed out" on gluster volume status volxxx command.
>>> All SSL certs on clients have also been renewed and all volumes were
>>> remounted. All 3 nodes were alternatively restarted (glusterd) and rebooted.
>>>
>>> The cluster is not in production environment but there's about ~250
>>> clients for ~75 volumes, I don't know how to troubleshoot and fix this
>>> problem, if anyone has an idea.
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://bluejeans.com/441850968
>>>
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>> --
>> Thanks,
>> Sanju
>>
>>
>
> --
> Thanks,
> Sanju
>
>

-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200427/f28b5171/attachment.html>