[Gluster-users] gluster peer probe error (v3.6.2)

Atin Mukherjee amukherj at redhat.com
Mon Mar 23 10:10:26 UTC 2015



On 03/23/2015 03:28 PM, Andreas Hollaus wrote:
> 2Hi,
> 
> This network problem is persistent. However, I can ping the server so guess it
> depends on the port no, right?
> I tried to telnet to port 24007, but I was not sure how to interpret the result as I
> got no respons and no timeout (it just seemed to be waiting for something).
> That's why I decided to install nmap, but according to that tool the port was
> accessible. Are there any other ports that are vital to gluster peer probe?
> 
> When you say 'deprobe', I guess you mean 'gluster peer detach'? That command shows
> similar behaviour to gluster peer probe.
Yes I meant peer detach. How about gluster peer detach force?
> 
> 
> Regards
> Andreas
> 
> On 03/23/15 05:34, Atin Mukherjee wrote:
>>
>> On 03/22/2015 07:11 PM, Andreas Hollaus wrote:
>>> Hi,
>>>
>>> I hope that these are the logs that you requested.
>>>
>>> Logs from 10.32.0.48:
>>> ------------------------------
>>> # more /var/log/glusterfs/.cmd_log_history
>>> [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe returned
>>>  with unknown errno -1
>>>
>>> # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
>>> [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/s
>>> bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/
>>> glusterd -p /var/run/glusterd.pid)
>>> [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: Maximum allo
>>> wed open file descriptors set to 65536
>>> [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using /var/l
>>> ib/glusterd as working directory
>>> [2015-03-19 13:41:31.247826] I [glusterd-store.c:2063:glusterd_restore_op_versio
>>> n] 0-management: Detected new install. Setting op-version to maximum : 30600
>>> [2015-03-19 13:41:31.247902] I [glusterd-store.c:3497:glusterd_store_retrieve_mi
>>> ssed_snaps_list] 0-management: No missed snaps list.
>>> Final graph:
>>> +------------------------------------------------------------------------------+
>>>   1: volume management
>>>   2:     type mgmt/glusterd
>>>   3:     option rpc-auth.auth-glusterfs on
>>>   4:     option rpc-auth.auth-unix on
>>>   5:     option rpc-auth.auth-null on
>>>   6:     option transport.socket.listen-backlog 128
>>>   7:     option ping-timeout 30
>>>   8:     option transport.socket.read-fail-log off
>>>   9:     option transport.socket.keepalive-interval 2
>>>  10:     option transport.socket.keepalive-time 10
>>>  11:     option transport-type socket
>>>  12:     option working-directory /var/lib/glusterd
>>>  13: end-volume
>>>  14: 
>>> +------------------------------------------------------------------------------+
>>> [2015-03-19 13:42:02.258403] I [glusterd-handler.c:1015:__glusterd_handle_cli_pr
>>> obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
>>> [2015-03-19 13:42:02.259456] I [glusterd-handler.c:3165:glusterd_probe_begin] 0-
>>> glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
>>> [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-manag
>>> ement: setting frame-timeout to 600
>>> [2015-03-19 13:42:02.260488] I [glusterd-handler.c:3098:glusterd_friend_add] 0-m
>>> anagement: connect returned 0
>>> [2015-03-19 13:42:02.270316] I [glusterd.c:176:glusterd_uuid_generate_save] 0-ma
>>> nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
>>> [2015-03-19 13:42:02.273427] I [glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 0-m
>>> anagement: Received probe resp from uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429,
>>> host: 10.32.1.144
>>> [2015-03-19 13:42:02.273681] I [glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 0-g
>>> lusterd: Received resp to probe req
>>> [2015-03-19 13:42:02.278863] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
>>> versions_ack] 0-management: using the op-version 30600
>>> [2015-03-19 13:52:03.277422] E [rpc-clnt.c:201:call_bail] 0-management: bailing
>>> out frame type(Peer mgmt) op(--(2)) xid = 0x6 sent = 2015-03-19 13:42:02.273482.
>>>  timeout = 600 for 10.32.1.144:24007
>> Here is the issue, there was some problem in the network at the time
>> when peer probe was issued. This is why the call bail is seen. Could you
>> try to deprobe and then probe it back again?
>>> [2015-03-19 13:52:03.277453] I [socket.c:3366:socket_submit_reply] 0-socket.mana
>>> gement: not connected (priv->connected = 255)
>>> [2015-03-19 13:52:03.277468] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-servi
>>> ce: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2,
>>> Proc: 1) to rpc-transport (socket.management)
>>> [2015-03-19 13:52:03.277483] E [glusterd-utils.c:387:glusterd_submit_reply] 0-:
>>> Reply submission failed
>>>
>>>
>>>
>>> Logs from 10.32.1.144:
>>> ---------------------------------
>>> # more ./.cmd_log_history
>>>
>>> # more ./etc-glusterfs-glusterd.vol.log
>>> [1970-01-01 00:00:53.225739] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/s
>>> bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/
>>> glusterd -p /var/run/glusterd.pid)
>>> [1970-01-01 00:00:53.229222] I [glusterd.c:1214:init] 0-management: Maximum allo
>>> wed open file descriptors set to 65536
>>> [1970-01-01 00:00:53.229301] I [glusterd.c:1259:init] 0-management: Using /var/l
>>> ib/glusterd as working directory
>>> [1970-01-01 00:00:53.231653] I [glusterd-store.c:2063:glusterd_restore_op_versio
>>> n] 0-management: Detected new install. Setting op-version to maximum : 30600
>>> [1970-01-01 00:00:53.231730] I [glusterd-store.c:3497:glusterd_store_retrieve_mi
>>> ssed_snaps_list] 0-management: No missed snaps list.
>>> Final graph:
>>> +------------------------------------------------------------------------------+
>>>   1: volume management
>>>   2:     type mgmt/glusterd
>>>   3:     option rpc-auth.auth-glusterfs on
>>>   4:     option rpc-auth.auth-unix on
>>>   5:     option rpc-auth.auth-null on
>>>   6:     option transport.socket.listen-backlog 128
>>>   7:     option ping-timeout 30
>>>   8:     option transport.socket.read-fail-log off
>>>   9:     option transport.socket.keepalive-interval 2
>>>  10:     option transport.socket.keepalive-time 10
>>>  11:     option transport-type socket
>>>  12:     option working-directory /var/lib/glusterd
>>>  13: end-volume
>>>  14: 
>>> +------------------------------------------------------------------------------+
>>> [1970-01-01 00:01:24.417689] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
>>> versions_ack] 0-management: using the op-version 30600
>>> [1970-01-01 00:01:24.417736] I [glusterd.c:176:glusterd_uuid_generate_save] 0-ma
>>> nagement: generated UUID: 82cdb873-28cc-4ed0-8cfe-2b6275770429
>>> [1970-01-01 00:01:24.420067] I [glusterd-handler.c:2523:__glusterd_handle_probe_
>>> query] 0-glusterd: Received probe from uuid: 4441e237-89d6-4cdf-a212-f17ecb953b5
>>> 8
>>> [1970-01-01 00:01:24.420158] I [glusterd-handler.c:2551:__glusterd_handle_probe_
>>> query] 0-glusterd: Unable to find peerinfo for host: 10.32.0.48 (24007)
>>> [1970-01-01 00:01:24.420379] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-manag
>>> ement: setting frame-timeout to 600
>>> [1970-01-01 00:01:24.421140] I [glusterd-handler.c:3098:glusterd_friend_add] 0-m
>>> anagement: connect returned 0
>>> [1970-01-01 00:01:24.421167] I [glusterd-handler.c:2575:__glusterd_handle_probe_
>>> query] 0-glusterd: Responded to 10.32.0.48, op_ret: 0, op_errno: 0, ret: 0
>>> [1970-01-01 00:01:24.422991] I [glusterd-handler.c:2216:__glusterd_handle_incomi
>>> ng_friend_req] 0-glusterd: Received probe from uuid: 4441e237-89d6-4cdf-a212-f17
>>> ecb953b58
>>> [1970-01-01 00:01:24.423024] E [glusterd-utils.c:5760:glusterd_compare_friend_da
>>> ta] 0-management: Importing global options failed
>>> [1970-01-01 00:01:24.423036] E [glusterd-sm.c:1078:glusterd_friend_sm] 0-gluster
>>> d: handler returned: -2
>>>  
>>>
>>> Regards
>>> Andreas
>>>
>>>
>>> On 03/22/15 07:33, Atin Mukherjee wrote:
>>>> On 03/22/2015 12:09 AM, Andreas Hollaus wrote:
>>>>> Hi,
>>>>>
>>>>> I get a strange result when I execute 'gluster peer probe'. The command hangs and
>>>>> seems to timeout without any message (I can ping the address):
>>>>> # gluster peer probe 10.32.1.144
>>>>> # echo $?
>>>>> 146
>>>> Could you provide the glusterd log and .cmd_log_history for all the
>>>> nodes in the cluster?
>>>>> The status looks promising, but there's a differences between this output and what
>>>>> you normally get from a successful call:
>>>>> # gluster peer status
>>>>> Number of Peers: 1
>>>>>
>>>>> Hostname: 10.32.1.144
>>>>> Uuid: 0b008d3e-c51b-4243-ad19-c79c869ba9f2
>>>>> State: Probe Sent to Peer (Connected)
>>>>>
>>>>> (instead of 'State: Peer in Cluster (Connected)')
>>>>>
>>>>> Running the command again will tell you that it is connected:
>>>>>
>>>>> # gluster peer probe 10.32.1.144
>>>>> peer probe: success. Host 10.32.1.144 port 24007 already in peer list
>>>> This means that this peer was added locally but peer handshake was not
>>>> completed for previous peer probe transaction. I would be interested to
>>>> see the logs and then can comment on what went wrong.
>>>>> But when you try to add a brick from that server it fails:
>>>>>
>>>>> # gluster volume add-brick c_test replica 2 10.32.1.144:/opt/lvmdir/c2 force
>>>>> volume add-brick: failed: Host 10.32.1.144 is not in 'Peer in Cluster' state
>>>>>
>>>>> The volume was previously created using the following commands:
>>>>> # gluster volume create c_test 10.32.0.48:/opt/lvmdir/c2 force
>>>>> volume create: c_test: success: please start the volume to access data
>>>>> # gluster volume start c_test
>>>>> volume start: c_test: success
>>>>>
>>>>> What could be the reason for this problem?
>>>>>
>>>>>
>>>>> Regards
>>>>> Andreas
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>
>>>
> 
> 

-- 
~Atin


More information about the Gluster-users mailing list