[Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

Mon Jul 15 21:45:03 UTC 2013

I have seen managed switches take forever to establish connections as well.  I was using a cisco catalyst at the time and IIRC I needed to enable portfast and disable spanning tree.

-b

----- Original Message -----
> From: "Joe Julian" <joe at julianfamily.org>
> To: "Ben Turner" <bturner at redhat.com>
> Cc: "Greg Scott" <GregScott at infrasupport.com>, gluster-users at gluster.org
> Sent: Monday, July 15, 2013 5:36:06 PM
> Subject: Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore
> 
> Ben may be on to something. If you're using NetworkManager, there may be
> a chicken and egg problem due to the lack of a hardware link being
> established (no switch). What if you mount from localhost?
> 
> On 07/15/2013 02:32 PM, Ben Turner wrote:
> > Hi Greg.  I don't know if this is the thread I replied to before but it
> > still sound to me like your NICs aren't fully up when the gluster mount is
> > getting mounted.  The _netdev(at least the version in RHEL 6, I haven't
> > looked at others) doesn't check if the NIC is fully up, it only looks to
> > see if the NW manager lock file exists.  When I saw this happen in my
> > tests the lockfile existed but the NIC was still initializing and unable
> > to send/receive traffic to mount the FS.  I was able to put a sleep in the
> > initscript to work around this:
> >
> > # diff -pruN /etc/rc.d/init.d/netfs /tmp/initrd/netfs
> > --- /etc/rc.d/init.d/netfs        2013-04-26 14:32:28.759283055 -0400
> > +++ /tmp/initrd/netfs        2013-04-26 14:31:38.320059175 -0400
> > @@ -32,8 +32,6 @@ NETDEVMTAB=$(LC_ALL=C awk '$4 ~ /_netdev
> >   # See how we were called.
> >   case "$1" in
> >     start)
> > -        echo "Sleeping 30 seconds for NW init workaround -benT"
> > -        sleep 30
> >           [ ! -f /var/lock/subsys/network ] && ! nm-online -x >/dev/null
> >           2>&1 && exit 0
> >           [ "$EUID" != "0" ] && exit 4
> >           [ -n "$NFSFSTAB" ] &&
> >
> > I just used the sleep for testing, the preferred way of dealing with this
> > is probably using the LINKDELAY option in your
> > /etc/sysconfig/network-scripts/ifcfg-* script.  This variable will cause
> > the network scripts to delay $LINKDELAY number of seconds.  Can you try
> > testing with either of those options to see if you are able to mount at
> > boot?
> >
> > -b
> >
> > ----- Original Message -----
> >> From: "Greg Scott" <GregScott at infrasupport.com>
> >> To: "Joe Julian" <joe at julianfamily.org>
> >> Cc: gluster-users at gluster.org
> >> Sent: Monday, July 15, 2013 4:44:10 PM
> >> Subject: Re: [Gluster-users] One node goes offline, the other node can't
> >> see the replicated volume anymore
> >>
> >> This time after rebooting both nodes, neither one shows /firewall-scripts
> >> mounted after a login.   But mount -av by hand is successful on both
> >> nodes.
> >> Fw1 and fw2 both behave identically.  Here is what fw1 looks like.  Fw2 is
> >> identical.   This aspect of the problem is screaming timing glitch.
> >>
> >> login as: root
> >> root at 10.10.10.71's password:
> >> Last login: Mon Jul 15 15:19:41 2013 from tinahp100b.infrasupport.local
> >> [root at chicago-fw1 ~]# df -h
> >> Filesystem                       Size  Used Avail Use% Mounted on
> >> /dev/mapper/fedora-root           14G  3.9G  8.7G  31% /
> >> devtmpfs                         990M     0  990M   0% /dev
> >> tmpfs                            996M     0  996M   0% /dev/shm
> >> tmpfs                            996M  888K  996M   1% /run
> >> tmpfs                            996M     0  996M   0% /sys/fs/cgroup
> >> tmpfs                            996M     0  996M   0% /tmp
> >> /dev/sda2                        477M   87M  365M  20% /boot
> >> /dev/sda1                        200M  9.4M  191M   5% /boot/efi
> >> /dev/mapper/fedora-gluster--fw1  7.9G   33M  7.8G   1% /gluster-fw1
> >> [root at chicago-fw1 ~]# mount -av
> >> /                        : ignored
> >> /boot                    : already mounted
> >> /boot/efi                : already mounted
> >> /gluster-fw1             : already mounted
> >> swap                     : ignored
> >> extra arguments at end (ignored)
> >> /firewall-scripts        : successfully mounted
> >> [root at chicago-fw1 ~]# df -h
> >> Filesystem                       Size  Used Avail Use% Mounted on
> >> /dev/mapper/fedora-root           14G  3.9G  8.7G  31% /
> >> devtmpfs                         990M     0  990M   0% /dev
> >> tmpfs                            996M     0  996M   0% /dev/shm
> >> tmpfs                            996M  888K  996M   1% /run
> >> tmpfs                            996M     0  996M   0% /sys/fs/cgroup
> >> tmpfs                            996M     0  996M   0% /tmp
> >> /dev/sda2                        477M   87M  365M  20% /boot
> >> /dev/sda1                        200M  9.4M  191M   5% /boot/efi
> >> /dev/mapper/fedora-gluster--fw1  7.9G   33M  7.8G   1% /gluster-fw1
> >> 192.168.253.1:/firewall-scripts  7.6G   19M  7.2G   1% /firewall-scripts
> >> [root at chicago-fw1 ~]#
> >>
> >> - Greg
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
>