[Gluster-users] socket.c:2161:socket_connect_finish (Connection refused)

Wed Jun 11 07:26:48 UTC 2014

Pranith,
how could I move all data from the two problem bricks temporarily until 
the release of 3.5.1?
Like this?
# gluster volume replace-brick VOLNAME BRICK NEW-BRICK start
Will this work if the bricks are offline?
Or is there some other way to get the bricks back online manually?
Would it help to do all fuse connections via NFS until after the fix?
Cheers,
Olav

On 11/06/14 08:44, Olav Peeters wrote:
> OK, thanks for the info!
> Regards,
> Olav
>
> On 11/06/14 08:38, Pranith Kumar Karampuri wrote:
>>
>> On 06/11/2014 12:03 PM, Olav Peeters wrote:
>>> Thanks Pranith!
>>>
>>> I see this at the end of the log files of one of the problem bricks 
>>> (the first two errors are repeated several times):
>>>
>>> [2014-06-10 09:55:28.354659] E [rpcsvc.c:1206:rpcsvc_submit_generic] 
>>> 0-rpc-service: failed to submit message (XID: 0x103c59, Program: 
>>> GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport 
>>> (tcp.sr_vol01-server)
>>> [2014-06-10 09:55:28.354683] E [server.c:190:server_submit_reply] 
>>> (-->/usr/lib64/glusterfs/3.5.0/xlator/performance/io-threads.so(iot_finodelk_cbk+0xb9) 
>>> [0x7f8c8e82f189] 
>>> (-->/usr/lib64/glusterfs/3.5.0/xlator/debug/io-stats.so(io_stats_finodelk_cbk+0xed) 
>>> [0x7f8c8e1f22ed] 
>>> (-->/usr/lib64/glusterfs/3.5.0/xlator/protocol/server.so(server_finodelk_cbk+0xad) 
>>> [0x7f8c8dfc555d]))) 0-: Reply submission failed
>>> pending frames:
>>> frame : type(0) op(30)
>>> frame : type(0) op(30)
>>> frame : type(0) op(30)
>>> frame : type(0) op(30)
>>> ...
>>> ...
>>>
>>> frame : type(0) op(30)
>>> frame : type(0) op(30)
>>> frame : type(0) op(30)
>>>
>>> patchset: git://git.gluster.com/glusterfs.git
>>> signal received: 11
>>> time of crash: 2014-06-10 09:55:28configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> fdatasync 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 3.5.0
>>> /lib64/libc.so.6(+0x329a0)[0x7f8c94aac9a0]
>>> /usr/lib64/glusterfs/3.5.0/xlator/features/locks.so(grant_blocked_inode_locks+0xc1)[0x7f8c8ea54061] 
>>>
>>> /usr/lib64/glusterfs/3.5.0/xlator/features/locks.so(pl_inodelk_client_cleanup+0x249)[0x7f8c8ea54569] 
>>>
>>> /usr/lib64/glusterfs/3.5.0/xlator/features/locks.so(+0x6f0a)[0x7f8c8ea49f0a] 
>>>
>>> /usr/lib64/libglusterfs.so.0(gf_client_disconnect+0x5d)[0x7f8c964d701d]
>>> /usr/lib64/glusterfs/3.5.0/xlator/protocol/server.so(server_connection_cleanup+0x458)[0x7f8c8dfbda48] 
>>>
>>> /usr/lib64/glusterfs/3.5.0/xlator/protocol/server.so(server_rpc_notify+0x183)[0x7f8c8dfb9713] 
>>>
>>> /usr/lib64/libgfrpc.so.0(rpcsvc_handle_disconnect+0x105)[0x7f8c96261d35] 
>>>
>>> /usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x1a0)[0x7f8c96263880]
>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f8c96264f98]
>>> /usr/lib64/glusterfs/3.5.0/rpc-transport/socket.so(+0xa9a1)[0x7f8c914c39a1] 
>>>
>>> /usr/lib64/libglusterfs.so.0(+0x672f7)[0x7f8c964d92f7]
>>> /usr/sbin/glusterfsd(main+0x564)[0x4075e4]
>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f8c94a98d1d]
>>> /usr/sbin/glusterfsd[0x404679]
>>> ---------
>>>
>>> Again no info to be found online about the error.
>>> Any idea?
>>
>> This is because of bug 1089470 which is fixed for 3.5.1. Which will 
>> be releasing shortly.
>>
>> Pranith
>>> Olav
>>>
>>>
>>>
>>>
>>>
>>> On 11/06/14 04:42, Pranith Kumar Karampuri wrote:
>>>> Olav,
>>>>      Check logs of the bricks to see why the bricks went down.
>>>>
>>>> Pranith
>>>>
>>>> On 06/11/2014 04:02 AM, Olav Peeters wrote:
>>>>> Hi,
>>>>> I upgraded from glusterfs 3.4 to 3.5 about 8 days ago. Everything 
>>>>> was running fine until this morning. In a fuse mount we were 
>>>>> having write issues. Creating and deleting files became an issue 
>>>>> all of a sudden without any new changes to the cluster.
>>>>>
>>>>> In /var/log/glusterfs/glustershd.log every couple of seconds I'm 
>>>>> getting this:
>>>>>
>>>>> [2014-06-10 22:23:52.055128] I [rpc-clnt.c:1685:rpc_clnt_reconfig] 
>>>>> 0-sr_vol01-client-13: changing port to 49156 (from 0)
>>>>> [2014-06-10 22:23:52.060153] E 
>>>>> [socket.c:2161:socket_connect_finish] 0-sr_vol01-client-13: 
>>>>> connection to ip-of-one-of-the-gluster-nodes:49156 failed 
>>>>> (Connection refused)
>>>>>
>>>>> # gluster volume status sr_vol01
>>>>> shows that two bricks of the 18 are offline.
>>>>>
>>>>> rebalance fails
>>>>>
>>>>> Iptables was stopped on all nodes
>>>>>
>>>>> If I cd into the two bricks which are offline according to the 
>>>>> gluster v status, I can read/write without any problems... The 
>>>>> disks are clearly fine. They are mounted, they are available.
>>>>>
>>>>> I cannot find much info online about the error.
>>>>> Does anyone have an idea what could be wrong?
>>>>> How can I get the two bricks back online?
>>>>>
>>>>> Cheers,
>>>>> Olav
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>
>