[Gluster-users] One node goes offline, the other node can't see the replicated volume anymore
Greg Scott
GregScott at infrasupport.com
Mon Jul 15 22:23:37 UTC 2013
I think we're making progress. I put in a sleep 30 in my rc.local, rebooted, and my filesystem is now mounted after my first logon.
Still some stuff I don't understand in /var/log/messages, but my before and after mounts look much better. And notice how messages from all kinds of things get mixed in together. So systemd must fire up a bunch of concurrent threads to do its thing. And that's why F19 boots so fast. But the tradeoff is you can't count on things happening in sequence.
I wonder if I can set up one of those service doo-dad files, where I want glusterd started first and then have it run a script to mount my stuff? That would be a more deterministic way to do it versus sleeping 30 seconds in rc.local. I have to go out for a couple of hours. I'll see what I can put together and report results here.
Jul 15 17:13:18 chicago-fw1 audispd: queue is full - dropping event
Jul 15 17:13:18 chicago-fw1 audispd: queue is full - dropping event
Jul 15 17:13:18 chicago-fw1 audispd: queue is full - dropping event
Jul 15 17:13:20 chicago-fw1 systemd[1]: Started GlusterFS an clustered file-system server.
Jul 15 17:13:22 chicago-fw1 mount[1001]: Mount failed. Please check the log file for more details.
Jul 15 17:13:22 chicago-fw1 systemd[1]: firewall\x2dscripts.mount mount process exited, code=exited status=1
Jul 15 17:13:22 chicago-fw1 systemd[1]: Unit firewall\x2dscripts.mount entered failed state.
.
.
. a bazillion meaningless selinux warnings (because selinux=permissive here)
.
.
Jul 15 17:13:40 chicago-fw1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from name_bind access on the tcp_socket .
For complete SELinux messages. run sealert -l 221b72d0-d5d8-4a70-bedd-697a6b9e0f03
Jul 15 17:13:40 chicago-fw1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from name_bind access on the tcp_socket .
For complete SELinux messages. run sealert -l 22b9b899-3fe2-47fc-8c5d-7bd5ed0e1f17
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Mon Jul 15 17:13:40 CDT 2013
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Making sure the Gluster stuff is mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Mounted before mount -av
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Filesystem Size Used Avail Use% Mounted on
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/mapper/fedora-root 14G 3.9G 8.7G 31% /
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: devtmpfs 990M 0 990M 0% /dev
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs 996M 0 996M 0% /dev/shm
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs 996M 884K 996M 1% /run
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs 996M 0 996M 0% /sys/fs/cgroup
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs 996M 0 996M 0% /tmp
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/sda2 477M 87M 365M 20% /boot
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/sda1 200M 9.4M 191M 5% /boot/efi
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/mapper/fedora-gluster--fw1 7.9G 33M 7.8G 1% /gluster-fw1
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: extra arguments at end (ignored)
Jul 15 17:13:40 chicago-fw1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from name_bind access on the tcp_socket .
For complete SELinux messages. run sealert -l 225efbe9-0ea3-4f5b-8791-c325d2f0eed6
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: / : ignored
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /boot : already mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /boot/efi : already mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /gluster-fw1 : already mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: swap : ignored
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /firewall-scripts : successfully mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Mounted after mount -av
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Filesystem Size Used Avail Use% Mounted on
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/mapper/fedora-root 14G 3.9G 8.7G 31% /
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: devtmpfs 990M 0 990M 0% /dev
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs 996M 0 996M 0% /dev/shm
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs 996M 884K 996M 1% /run
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs 996M 0 996M 0% /sys/fs/cgroup
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs 996M 0 996M 0% /tmp
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/sda2 477M 87M 365M 20% /boot
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/sda1 200M 9.4M 191M 5% /boot/efi
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/mapper/fedora-gluster--fw1 7.9G 33M 7.8G 1% /gluster-fw1
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: 192.168.253.1:/firewall-scripts 7.6G 19M 7.2G 1% /firewall-scripts
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Starting up firewall common items
Jul 15 17:13:40 chicago-fw1 systemd[1]: Started /etc/rc.d/rc.local Compatibility.
Jul 15 17:13:40 chicago-fw1 systemd[1]: Starting Terminate Plymouth Boot Screen...
Greg Scott
Infrasupport Corporation
GregScott at Infrasupport.com
Direct 1-651-260-1051
-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Greg Scott
Sent: Monday, July 15, 2013 5:12 PM
To: 'Marcus Bointon'; gluster-users at gluster.org List
Subject: Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore
And for what it's worth, I just now looked and noticed rc.local does not really run last in the startup sequence anymore. According to below, it only depends on the network being started. So I could easily be trying my mounts before gluster ever gets fired up.
[root at chicago-fw1 system]# pwd
/usr/lib/systemd/system
[root at chicago-fw1 system]# more rc-local.service # This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version.
# This unit gets pulled automatically into multi-user.target by # systemd-rc-local-generator if /etc/rc.d/rc.local is executable.
[Unit]
Description=/etc/rc.d/rc.local Compatibility ConditionFileIsExecutable=/etc/rc.d/rc.local
After=network.target
[Service]
Type=forking
ExecStart=/etc/rc.d/rc.local start
TimeoutSec=0
RemainAfterExit=yes
SysVStartPriority=99
[root at chicago-fw1 system]#
- Greg
-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Marcus Bointon
Sent: Monday, July 15, 2013 5:05 PM
To: gluster-users at gluster.org List
Subject: Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore
On 15 Jul 2013, at 21:22, Greg Scott <GregScott at infrasupport.com> wrote:
> # The fstab mounts happen early in startup, then Gluster starts up later.
> # By now, Gluster should be up and running and the mounts should work.
> # That _netdev option is supposed to account for the delay but doesn't
> seem # to work right.
It's interesting to see that script - that's what's happens to me with 3.3.0. If I set gluster mounts to mount from fstab with _netdev, it hangs the boot completely and I have to go into single user mode (and edit it out of fstab) to recover, though gluster logs nothing at all. Autofs fails too (though I think that's autofs not understanding mounting an NFS volume from localhost), yet it all works on a manual mount.
Sad to see you're having trouble with 3.4. I hope you can make it work!
Marcus
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list