[Gluster-users] GlusterFS cluster of 2 nodes is disconnected after nodes reboot

Артём Конвалюк artret at gmail.com
Wed Mar 26 08:40:31 UTC 2014


> What linux distro ?
>
> Anything special about your network configuration ?
>
> Any chance your server is taking too long to release networking and
gluster
> is starting before network is ready ?
>
> Can you completely disable iptables and test again ?

Both nodes are CentOS 6.5 VMs running on VMware ESXi 5.5.0. There is
nothing special about network configuration, just static IPs. Ping and ssh
works fine. I added "iptables -F" to /etc/rc.local. After simulteneous
reboot "gluster peer status" on both nodes is connected and replication
works fine. But "gluster volume status" states that NFS server and
self-heal daemon on one of them isn't running. So I need to restart
glusterd to make them running.

Another issue: when everything is OK after "service glusterd restart" on
both nodes, I reboot one node and then can see on the rebooted node
(ipset02):

*[root at ipset02 etc]#* gluster peer status
Number of Peers: 1

Hostname: ipset01
Uuid: 6313a4dd-f736-46ff-9836-bdf05c886ffd
State: Peer in Cluster (Connected)
*[root at ipset02 etc]#* gluster volume status
Status of volume: ipset-gv
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick ipset01:/usr/local/etc/ipset        49152    Y    1615
Brick ipset02:/usr/local/etc/ipset        49152    Y    2282
NFS Server on localhost                    2049    Y    2289
Self-heal Daemon on localhost                N/A    Y    2296
NFS Server on ipset01            2049    Y    2258
Self-heal Daemon on ipset01       N/A    Y    2262

There are no active volume tasks

[root at ipset02 etc]# tail -17 /var/log/glusterfs/glustershd.log
[2014-03-26 07:55:48.982456] E
[client-handshake.c:1742:client_query_portmap_cbk] 0-ipset-gv-client-1:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2014-03-26 07:55:48.982532] W [socket.c:514:__socket_rwv]
0-ipset-gv-client-1: readv failed (No data available)
[2014-03-26 07:55:48.982555] I [client.c:2097:client_rpc_notify]
0-ipset-gv-client-1: disconnected
[2014-03-26 07:55:48.982572] I [rpc-clnt.c:1676:rpc_clnt_reconfig]
0-ipset-gv-client-0: changing port to 49152 (from 0)
[2014-03-26 07:55:48.982627] W [socket.c:514:__socket_rwv]
0-ipset-gv-client-0: readv failed (No data available)
[2014-03-26 07:55:48.986252] I
[client-handshake.c:1659:select_server_supported_programs]
0-ipset-gv-client-0: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2014-03-26 07:55:48.986551] I
[client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-0:
Connected to 192.168.1.180:49152, attached to remote volume
'/usr/local/etc/ipset'.
[2014-03-26 07:55:48.986566] I
[client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-0: Server
and Client lk-version numbers are not same, reopening the fds
[2014-03-26 07:55:48.986628] I [afr-common.c:3698:afr_notify]
0-ipset-gv-replicate-0: Subvolume 'ipset-gv-client-0' came back up; going
online.
[2014-03-26 07:55:48.986743] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-0:
Server lk version = 1
[2014-03-26 07:55:52.975670] I [rpc-clnt.c:1676:rpc_clnt_reconfig]
0-ipset-gv-client-1: changing port to 49152 (from 0)
[2014-03-26 07:55:52.975717] W [socket.c:514:__socket_rwv]
0-ipset-gv-client-1: readv failed (No data available)
[2014-03-26 07:55:52.978961] I
[client-handshake.c:1659:select_server_supported_programs]
0-ipset-gv-client-1: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2014-03-26 07:55:52.979128] I
[client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-1:
Connected to 192.168.1.181:49152, attached to remote volume
'/usr/local/etc/ipset'.
[2014-03-26 07:55:52.979143] I
[client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-1: Server
and Client lk-version numbers are not same, reopening the fds
[2014-03-26 07:55:52.979269] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-1:
Server lk version = 1
[2014-03-26 07:55:52.980284] I
[afr-self-heald.c:1180:afr_dir_exclusive_crawl] 0-ipset-gv-replicate-0:
Another crawl is in progress for ipset-gv-client-1


And on the node that wasn't rebooted:

*[root at ipset01 ~]#* gluster peer status
Number of Peers: 1

Hostname: ipset02
Uuid: ff14ab0e-53cf-4015-9e49-fb60698c56db
State: Peer in Cluster (Disconnected)
*[root at ipset01 ~]#* gluster volume status
Status of volume: ipset-gv
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick ipset01:/usr/local/etc/ipset        49152    Y    1615
NFS Server on localhost                    2049    Y    2258
Self-heal Daemon on localhost                N/A    Y    2262

There are no active volume tasks

[root at ipset01 ~]# tail -3 /var/log/glusterfs/glustershd.log
[2014-03-26 07:50:28.881369] W [socket.c:514:__socket_rwv]
0-ipset-gv-client-1: readv failed (Connection reset by peer)
[2014-03-26 07:50:28.881421] W [socket.c:1962:__socket_proto_state_machine]
0-ipset-gv-client-1: reading from socket failed. Error (Connection reset by
peer), peer (192.168.1.181:49152)
[2014-03-26 07:50:28.881463] I [client.c:2097:client_rpc_notify]
0-ipset-gv-client-1: disconnected

Howerver, it seems that files replicate fine on both nodes. After "service
glusterd restart" on the first node (ipset01) "gluster peer status" is
connected. This behavior is strange.

> May not be cause of your problems but it does bad things and gluster
> sees this as a 'crash' even with graceful shutdown

I have no /var/lock/subsys/glusterfsd file too, but there is
/var/lock/subsys/glusterd. As far as I know new versions of GlusterFS use
glusterd init file instead of glusterfsd.

[root at ipset01 etc]# service glusterfsd status
glusterfsd (pid 2338) is running...
[root at ipset01 etc]# service glusterd stop                  [  OK  ]
[root at ipset01 etc]# service glusterd status
glusterd dead but subsys locked
[root at ipset01 etc]# service glusterfsd status
glusterfsd (pid 2338) is running...

Is it OK that glusterfsd still running?

2014-03-26 2:16 GMT+04:00 Viktor Villafuerte <
viktor.villafuerte at optusnet.com.au>:

> Also see this bug
> https://bugzilla.redhat.com/show_bug.cgi?id=1073217
>
> May not be cause of your problems but it does bad things and gluster
> sees this as a 'crash' even with graceful shutdown
>
> v
>
>
>
> On Tue 25 Mar 2014 22:24:22, Carlos Capriotti wrote:
> > Let's go with the data collection first.
> >
> > What linux distro ?
> >
> > Anything special about your network configuration ?
> >
> > Any chance your server is taking too long to release networking and
> gluster
> > is starting before network is ready ?
> >
> > Can you completely disable iptables and test again ?
> >
> > I am afraid quorum will not help you if you cannot get this issue
> > corrected.
> >
> >
> >
> >
> > On Tue, Mar 25, 2014 at 3:14 PM, Артём Конвалюк <artret at gmail.com>
> wrote:
> >
> > > Hello!
> > >
> > > I have 2 nodes with GlusterFS 3.4.2. I created one replica volume
> using 2
> > > bricks and enabled glusterd autostarts. Also firewall is configured
> and I
> > > have to run "iptables -F" on nodes after reboot. It is clear that
> firewall
> > > should be disabled in LAN, but I'm interested in my case.
> > >
> > > Problem: When I reboot both nodes and run "iptables -F" peer status is
> > > still disconnected. I wonder why. After "service glusterd restart" peer
> > > status is connected. But I have to run "gluster volume heal
> <volume-name>"
> > > to make both servers consistent and be able to replicate files. Is
> there
> > > any way to eliminate this problem?
> > >
> > > I read about server-quorum, but it needs 3 or more nodes. Am I right?
> > >
> > > Best Regards,
> > > Artem Konvalyuk
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >
>
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
> --
> Regards
>
> Viktor Villafuerte
> Optus Internet Engineering
> t: 02 808-25265
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140326/70cf6596/attachment.html>


More information about the Gluster-users mailing list