[Gluster-users] One node goes offline, the other node can't see the replicated volume anymore
Joe Julian
joe at julianfamily.org
Sun Jul 14 00:37:36 UTC 2013
These logs show different results. The results you reported and pasted
earlier included, "[2013-07-09 00:59:04.706390] I
[afr-common.c:3856:afr_local_init] 0-firewall-scripts-replicate-0: no
subvolumes up", which would produce the "Transport endpoint not
connected" error you reported at first. These results look normal and
should have produced the behavior I described.
42 is The Answer to Life, The Universe, and Everything.
Re-establishing FDs and locks is an expensive operation. The
ping-timeout is long because it should not happen, but if there is
temporary network congestion you'd (normally) rather have your volume
remain up and pause than have to re-establish everything. Typically,
unless you expect your servers to crash often, leaving ping-timeout at
the default is best. YMMV and it's configurable in case you know what
you're doing and why.
On 07/13/2013 04:58 PM, Greg Scott wrote:
>
> Log files sent privately to Joe. If others from the community want to
> look at them, I’m OK with posting them here. I don’t think they have
> anything confidential. Now that I know about that 42 second timeout,
> the behavior makes more sense. Why 42? What’s special about 42?
> Is there a way I adjust that down for my application to, say, 1 or 2
> seconds?
>
> -Greg
>
> *From:*Joe Julian [mailto:joe at julianfamily.org]
> *Sent:* Saturday, July 13, 2013 4:28 PM
> *To:* Greg Scott; 'gluster-users at gluster.org'
> *Subject:* Re: [Gluster-users] One node goes offline, the other node
> can't see the replicated volume anymore
>
> Huh.. this was in my sent folder... let's try again.
>
> There's something missing from this picture. The logs show that the
> client is connecting to both servers, but it only shows the
> disconnection from one and claims that it's not connected to any
> bricks after that.
>
> Here's the data I'd like to have you generate:
>
> unmount the clients
> gluster volume set firewall-scripts diagnostics.client-log-level DEBUG
> gluster volume set firewall-scripts diagnostics.brick-log-level DEBUG
> systemctl stop glusterd.service
> truncate the client, glusterd, and server logs
> systemctl start glusterd
> mount /firewall-scripts
> Do your iptables disconnect
> telnet $this_host_ip 24007 # report whether or not it establishes a
> connection
> ls /firewall-scripts
> wait 42 seconds
> ls /firewall-scripts
> Remove the iptables rule
> ls /firewall-scripts
> tar up the logs and email them to me.
>
> You can reset the log-level:
>
> gluster volume reset firewall-scripts diagnostics.client-log-level
> gluster volume reset firewall-scripts diagnostics.brick-log-level
>
> lastly, do you have a loopback interface (lo) on 127.0.0.1 and is
> localhost defined in /etc/hosts?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130713/eb1a5403/attachment.html>
More information about the Gluster-users
mailing list