[Gluster-users] Problems with SLES 11

Phil Bayfield phil at techlightenment.com
Wed Sep 7 09:15:43 UTC 2011


Hi there,

I compiled and installed the latest version of Gluster on a couple of SLES
11 SP1 boxes, everything up to this point seemed ok.

I start the daemon on both boxes, and both are listening on 24007.

I issue a "gluster peer probe"  command on one of the boxes and the daemon
instantly dies, I restart it and it shows:

# gluster peer status
Number of Peers: 1

Hostname: mckalcpap02
Uuid: 00000000-0000-0000-0000-000000000000
State: Establishing Connection (Connected)

I attempted to run the probe on the other box, the daemon crashes, now as I
start the daemon on each box the daemon just crashes on the other box.

The log output immediately prior to the crash is as follows:

[2011-06-07 08:05:10.700710] I
[glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: Received CLI
probe req mckalcpap02 24007
[2011-06-07 08:05:10.701058] I [glusterd-handler.c:391:glusterd_friend_find]
0-glusterd: Unable to find hostname: mckalcpap02
[2011-06-07 08:05:10.701086] I
[glusterd-handler.c:3422:glusterd_probe_begin] 0-glusterd: Unable to find
peerinfo for host: mckalcpap02 (24007)
[2011-06-07 08:05:10.702832] I [glusterd-handler.c:3404:glusterd_friend_add]
0-glusterd: connect returned 0
[2011-06-07 08:05:10.703110] I
[glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using Program
glusterd clnt mgmt, Num (1238433), Version (1)

If I use the IP address the same thing happens:

[2011-06-07 08:07:12.873075] I
[glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: Received CLI
probe req 10.9.54.2 24007
[2011-06-07 08:07:12.873410] I [glusterd-handler.c:391:glusterd_friend_find]
0-glusterd: Unable to find hostname: 10.9.54.2
[2011-06-07 08:07:12.873438] I
[glusterd-handler.c:3422:glusterd_probe_begin] 0-glusterd: Unable to find
peerinfo for host: 10.9.54.2 (24007)
[2011-06-07 08:07:12.875046] I [glusterd-handler.c:3404:glusterd_friend_add]
0-glusterd: connect returned 0
[2011-06-07 08:07:12.875280] I
[glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using Program
glusterd clnt mgmt, Num (1238433), Version (1)

There is no firewall issue:

# telnet mckalcpap02 24007
Trying 10.9.54.2...
Connected to mckalcpap02.
Escape character is '^]'.

Following restart (which crashes the other node) the log output is as
follows:

[2011-06-07 08:10:09.616486] I [glusterd.c:564:init] 0-management: Using
/etc/glusterd as working directory
[2011-06-07 08:10:09.617619] C [rdma.c:3933:rdma_init] 0-rpc-transport/rdma:
Failed to get IB devices
[2011-06-07 08:10:09.617676] E [rdma.c:4812:init] 0-rdma.management: Failed
to initialize IB Device
[2011-06-07 08:10:09.617700] E [rpc-transport.c:741:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2011-06-07 08:10:09.617724] W [rpcsvc.c:1288:rpcsvc_transport_create]
0-rpc-service: cannot create listener, initing the transport failed
[2011-06-07 08:10:09.617830] I [glusterd.c:88:glusterd_uuid_init]
0-glusterd: retrieved UUID: 1e344f5d-6904-4d14-9be2-8f0f44b97dd7
[2011-06-07 08:10:11.258098] I [glusterd-handler.c:3404:glusterd_friend_add]
0-glusterd: connect returned 0
Given volfile:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option working-directory /etc/glusterd
  4:     option transport-type socket,rdma
  5:     option transport.socket.keepalive-time 10
  6:     option transport.socket.keepalive-interval 2
  7: end-volume
  8:

+------------------------------------------------------------------------------+
[2011-06-07 08:10:11.258431] I
[glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using Program
glusterd clnt mgmt, Num (1238433), Version (1)
[2011-06-07 08:10:11.280533] W [socket.c:1494:__socket_proto_state_machine]
0-socket.management: reading from socket failed. Error (Transport endpoint
is not connected), peer (10.9.54.2:1023)
[2011-06-07 08:10:11.280595] W [socket.c:1494:__socket_proto_state_machine]
0-management: reading from socket failed. Error (Transport endpoint is not
connected), peer (10.9.54.2:24007)
[2011-06-07 08:10:17.256235] E [socket.c:1685:socket_connect_finish]
0-management: connection to 10.9.54.2:24007 failed (Connection refused)

There are no logs on the node which crashes.

I've tried various possibly solutions from searching the net but got getting
anywhere, can anyone advise how to proceed?

Thanks,
Phil.

-- 
Phil Bayfield
Development Manager
Alchemy Social, part of Techlightenment, an Experian company

Office 202 | 89 Worship Street | London | EC2A 2BF

t:   +44 (0) 207 392 2618
m: +44 (0) 7825 561 091
e:  phil at techlightenment.com
<phil at techlightenment.com>skype: phil.tl

www.techlightenment.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110907/fdecbbc5/attachment.html>


More information about the Gluster-users mailing list