[Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

Joe Julian joe at julianfamily.org
Tue Jul 16 16:16:05 UTC 2013


Get rid of every other mount attempt. No custom systemd script, no 
rc.local (I know you start your own app from there, but let's get one 
thing working first) and make sure the fstab entry still has the _netdev 
option.

Even the guy that wrote systemd (Lennart Pottering) says that we're 
correct.

Assuming we are, and you still don't get a mounted filesystem, let's 
take another look at the client, brick, and glusterd logs using this 
service definition.

On 07/16/2013 08:30 AM, Greg Scott wrote:
>
> Didn’t seem to make a difference.   Not mounted right after logging 
> in.  Looks like the same behavior.  The mount fails, then my rc.local 
> kicks in and says it succeeded, but doesn’t show it mounted later when 
> I do my “after” df –h.
>
> [root at chicago-fw1 ~]# df -h
>
> Filesystem Size  Used Avail Use% Mounted on
>
> /dev/mapper/fedora-root 14G  3.9G  8.7G  31% /
>
> devtmpfs 990M     0  990M   0% /dev
>
> tmpfs 996M     0  996M   0% /dev/shm
>
> tmpfs 996M  888K  996M   1% /run
>
> tmpfs 996M     0  996M   0% /sys/fs/cgroup
>
> tmpfs 996M     0  996M   0% /tmp
>
> /dev/sda2 477M   87M  365M  20% /boot
>
> /dev/sda1 200M  9.4M  191M   5% /boot/efi
>
> /dev/mapper/fedora-gluster--fw1 7.9G   33M  7.8G   1% /gluster-fw1
>
> [root at chicago-fw1 ~]#
>
> [root at chicago-fw1 ~]# tail /var/log/messages -c 50000 | more
>
> 0.10.71.
>
> Jul 16 10:21:23 chicago-fw1 avahi-daemon[446]: New relevant interface 
> enp5s7.IPv4 for mDNS.
>
> Jul 16 10:21:23 chicago-fw1 avahi-daemon[446]: Registering new address 
> record for 10.10.10.71 on enp5s7.IPv4.
>
> Jul 16 10:21:23 chicago-fw1 kernel: [   22.284616] r8169 0000:05:04.0 
> enp5s4: link up
>
> Jul 16 10:21:24 chicago-fw1 kernel: [   22.996223] r8169 0000:05:07.0 
> enp5s7: link up
>
> Jul 16 10:21:24 chicago-fw1 kernel: [   22.996240] IPv6: 
> ADDRCONF(NETDEV_CHANGE): enp5s7: link becomes ready
>
> Jul 16 10:21:24 chicago-fw1 network[464]: Bringing up interface 
> enp5s7:  [  OK  ]
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Started LSB: Bring up/down 
> networking.
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Starting Network.
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Reached target Network.
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Started Login and scanning of 
> iSCSI devices.
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Starting Vsftpd ftp daemon...
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Starting RPC bind service...
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Starting OpenSSH server daemon...
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Starting /etc/rc.d/rc.local 
> Compatibility...
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Started RPC bind service.
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Starting GlusterFS an 
> clustered file-system server...
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Started Vsftpd ftp daemon.
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: Tue Jul 16 10:21:25 CDT 2013
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: Sleeping 30 seconds.
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: Tue Jul 16 10:21:25 CDT 2013
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: Making sure the Gluster 
> stuff is mounted
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: Mounted before mount -av
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> Filesystem                       Size  Used Avail Use% Mounted on
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> /dev/mapper/fedora-root           14G  3.9G  8.7G  31% /
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> devtmpfs                         990M     0  990M   0% /dev
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> tmpfs                            996M     0  996M   0% /dev/shm
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> tmpfs                            996M  2.1M  994M   1% /run
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> tmpfs                            996M     0  996M   0% /sys/fs/cgroup
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> tmpfs                            996M     0  996M   0% /tmp
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> /dev/sda2                        477M   87M  365M  20% /boot
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> /dev/sda1                        200M  9.4M  191M   5% /boot/efi
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: 
> /dev/mapper/fedora-gluster--fw1  7.9G   33M  7.8G   1% /gluster-fw1
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Started OpenSSH server daemon.
>
> Jul 16 10:21:25 chicago-fw1 rc.local[1005]: extra arguments at end 
> (ignored)
>
> Jul 16 10:21:25 chicago-fw1 dbus-daemon[465]: dbus[465]: [system] 
> Activating service name='org.fedoraproject.Setroubleshootd' (u
>
> sing servicehelper)
>
> Jul 16 10:21:25 chicago-fw1 dbus[465]: [system] Activating service 
> name='org.fedoraproject.Setroubleshootd' (using servicehelper
>
> )
>
> Jul 16 10:21:25 chicago-fw1 kernel: [   23.918403] fuse init (API 
> version 7.21)
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Mounted /firewall-scripts.
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Starting Remote File Systems.
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Reached target Remote File 
> Systems.
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Starting Trigger Flushing of 
> Journal to Persistent Storage...
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Mounting FUSE Control File 
> System...
>
> Jul 16 10:21:25 chicago-fw1 systemd[1]: Mounted FUSE Control File System.
>
> Jul 16 10:21:28 chicago-fw1 systemd[1]: Started Trigger Flushing of 
> Journal to Persistent Storage.
>
> Jul 16 10:21:28 chicago-fw1 systemd[1]: Starting Permit User Sessions...
>
> Jul 16 10:21:28 chicago-fw1 systemd[1]: Started Permit User Sessions.
>
> Jul 16 10:21:28 chicago-fw1 systemd[1]: Starting Command Scheduler...
>
> Jul 16 10:21:28 chicago-fw1 systemd[1]: Started Command Scheduler.
>
> Jul 16 10:21:28 chicago-fw1 systemd[1]: Starting Job spooling tools...
>
> Jul 16 10:21:28 chicago-fw1 systemd[1]: Started Job spooling tools.
>
> Jul 16 10:21:28 chicago-fw1 avahi-daemon[446]: Registering new address 
> record for fe80::230:18ff:fea2:a340 on enp5s7.*.
>
> Jul 16 10:21:28 chicago-fw1 dbus[465]: [system] Successfully activated 
> service 'org.fedoraproject.Setroubleshootd'
>
> Jul 16 10:21:28 chicago-fw1 dbus-daemon[465]: dbus[465]: [system] 
> Successfully activated service 'org.fedoraproject.Setroubleshootd'
>
> Jul 16 10:21:31 chicago-fw1 audispd: queue is full - dropping event
>
> Jul 16 10:21:31 chicago-fw1 audispd: queue is full - dropping event
>
> Jul 16 10:21:31 chicago-fw1 audispd: queue is full - dropping event
>
> .
>
> .
>
> .
>
> Jul 16 10:21:33 chicago-fw1 audispd: queue is full - dropping event
>
> Jul 16 10:21:33 chicago-fw1 audispd: queue is full - dropping event
>
> Jul 16 10:21:33 chicago-fw1 audispd: queue is full - dropping event
>
> Jul 16 10:21:34 chicago-fw1 systemd[1]: Started GlusterFS an clustered 
> file-system server.
>
> Jul 16 10:21:34 chicago-fw1 systemd[1]: Starting Network is Online.
>
> Jul 16 10:21:34 chicago-fw1 systemd[1]: Reached target Network is Online.
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: Mount failed. Please check 
> the log file for more details.
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: /                        : 
> ignored
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: /boot                    : 
> already mounted
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: /boot/efi       : already 
> mounted
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: /gluster-fw1             : 
> already mounted
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: swap                     : 
> ignored
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: /firewall-scripts        : 
> successfully mounted
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: Mounted after mount -av
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> Filesystem                       Size  Used Avail Use% Mounted on
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> /dev/mapper/fedora-root           14G  3.9G  8.7G  31% /
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> devtmpfs                         990M     0  990M   0% /dev
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> tmpfs                            996M     0  996M   0% /dev/shm
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> tmpfs                            996M  880K  996M   1% /run
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> tmpfs                            996M     0  996M   0% /sys/fs/cgroup
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> tmpfs                            996M     0  996M   0% /tmp
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> /dev/sda2                        477M   87M  365M  20% /boot
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> /dev/sda1                        200M  9.4M  191M   5% /boot/efi
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: 
> /dev/mapper/fedora-gluster--fw1  7.9G   33M  7.8G   1% /gluster-fw1
>
> Jul 16 10:21:38 chicago-fw1 rc.local[1005]: Starting up firewall 
> common items
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Started /etc/rc.d/rc.local 
> Compatibility.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Starting Terminate Plymouth 
> Boot Screen...
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Starting Wait for Plymouth 
> Boot Screen to Quit...
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Started Terminate Plymouth 
> Boot Screen.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Started Wait for Plymouth Boot 
> Screen to Quit.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Starting Getty on tty1...
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Started Getty on tty1.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Starting Login Prompts.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Reached target Login Prompts.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Reached target Multi-User System.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Starting Update UTMP about 
> System Runlevel Changes...
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Starting Stop Read-Ahead Data 
> Collection 10s After Completed Startup.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Started Stop Read-Ahead Data 
> Collection 10s After Completed Startup.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Started Update UTMP about 
> System Runlevel Changes.
>
> Jul 16 10:21:38 chicago-fw1 systemd[1]: Startup finished in 1.474s 
> (kernel) + 2.210s (initrd) + 33.180s (userspace) = 36.866s.
>
> [root at chicago-fw1 ~]# more /usr/lib/systemd/system/glusterd.service
>
> [Unit]
>
> Description=GlusterFS an clustered file-system server
>
> After=network.target rpcbind.service
>
> Before=network-online.target
>
> [Service]
>
> Type=forking
>
> PIDFile=/run/glusterd.pid
>
> LimitNOFILE=65536
>
> ExecStart=/usr/sbin/glusterd -p /run/glusterd.pid
>
> KillMode=process
>
> [Install]
>
> WantedBy=multi-user.target
>
> [root at chicago-fw1 ~]#
>
> -Greg
>
> *From:*Joe Julian [mailto:joe at julianfamily.org]
> *Sent:* Tuesday, July 16, 2013 10:09 AM
> *To:* Greg Scott
> *Cc:* gluster-users at gluster.org
> *Subject:* Re: [Gluster-users] One node goes offline, the other node 
> can't see the replicated volume anymore
>
> Try this: https://gist.github.com/joejulian/6009570 
> <https://gist.github.com/joejulian/6009570> see if it works any 
> better. We're looking for " GlusterFS an clustered file-system server" 
> to appear earlier than mounting.
>
> On 07/15/2013 02:59 PM, Greg Scott wrote:
>
>     Hmmm - I turn off NetworkManager for my application but I can easily sleep a while in rc.local before doing mount -av and see what happens.  And I will fix up glusterd.system.  I'll report back here shortly.
>
>       
>
>     - Greg
>
>       
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130716/dcd96f43/attachment.html>


More information about the Gluster-users mailing list