[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot
Soumya Koduri
skoduri at redhat.com
Fri May 5 19:10:54 UTC 2017
On 05/05/2017 08:04 PM, Adam Ru wrote:
> Hi Soumya,
>
> Thank you for the answer.
>
> Enabling Pacemaker? Yes, you’re completely right, I didn’t do it. Thank you.
>
> I spent some time by testing and I have some results. This is what I did:
>
> - Clean installation of CentOS 7.3 with all updates, 3x node,
> resolvable IPs and VIPs
> - Stopped firewalld (just for testing)
> - Install "centos-release-gluster" to get "centos-gluster310" repo and
> install following (nothing else):
> --- glusterfs-server
> --- glusterfs-ganesha
> - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem
> and secret.pem.pub on all nodes)
> - systemctl enable and start glusterd
> - gluster peer probe <other nodes>
> - gluster volume set all cluster.enable-shared-storage enable
> - systemctl enable and start pcsd.service
> - systemctl enable pacemaker.service (cannot be started at this moment)
> - Set password for hacluster user on all nodes
> - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla
> - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
> - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
> sure if needed)
> - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
> insert configuration
> - Try list files on other nodes: ls
> /var/run/gluster/shared_storage/nfs-ganesha/
> - gluster nfs-ganesha enable
> - Check on other nodes that nfs-ganesha.service is running and "pcs
> status" shows started resources
> - gluster volume create mynewshare replica 3 transport tcp node1:/<dir>
> node2:/<dir> node3:/<dir>
> - gluster volume start mynewshare
> - gluster vol set mynewshare ganesha.enable on
>
> After these steps, all VIPs are pingable and I can mount node1:/mynewshare
>
> Funny thing is that pacemaker.service is disabled again (something
> disabled it). This is status of important (I think) services:
yeah. We too had observed this recently. We guess probably pcs cluster
setup command first destroys existing cluster (if any) which may be
disabling pacemaker too.
>
> systemctl list-units --all
> # corosync.service loaded active running
> # glusterd.service loaded active running
> # nfs-config.service loaded inactive dead
> # nfs-ganesha-config.service loaded inactive dead
> # nfs-ganesha-lock.service loaded active running
> # nfs-ganesha.service loaded active running
> # nfs-idmapd.service loaded inactive dead
> # nfs-mountd.service loaded inactive dead
> # nfs-server.service loaded inactive dead
> # nfs-utils.service loaded inactive dead
> # pacemaker.service loaded active running
> # pcsd.service loaded active running
>
> systemctl list-unit-files --all
> # corosync-notifyd.service disabled
> # corosync.service disabled
> # glusterd.service enabled
> # glusterfsd.service disabled
> # nfs-blkmap.service disabled
> # nfs-config.service static
> # nfs-ganesha-config.service static
> # nfs-ganesha-lock.service static
> # nfs-ganesha.service disabled
> # nfs-idmap.service static
> # nfs-idmapd.service static
> # nfs-lock.service static
> # nfs-mountd.service static
> # nfs-rquotad.service disabled
> # nfs-secure-server.service static
> # nfs-secure.service static
> # nfs-server.service disabled
> # nfs-utils.service static
> # nfs.service disabled
> # nfslock.service static
> # pacemaker.service disabled
> # pcsd.service enabled
>
> I enabled pacemaker again on all nodes and restart all nodes one by one.
>
> After reboot all VIPs are gone and I can see that nfs-ganesha.service
> isn’t running. When I start it on at least two nodes then VIPs are
> pingable again and I can mount NFS again. But there is still some issue
> in the setup because when I check nfs-ganesha-lock.service I get:
>
> systemctl -l status nfs-ganesha-lock.service
> ● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
> Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
> static; vendor preset: disabled)
> Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC;
> 31min ago
> Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
> (code=exited, status=1/FAILURE)
>
> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status
> monitor for NFSv2/3 locking....
> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0 starting
> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC
> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
> directory sm: Permission denied
Okay this issue was fixed and the fix should be present in 3.10 too -
https://review.gluster.org/#/c/16433/
Please check '/var/log/messages' for statd related errors and
cross-check permissions of that directory. You could manually chown
owner:group of /var/lib/nfs/statd/sm directory for now and then restart
nfs-ganesha* services.
Thanks,
Soumya
> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
> /var/lib/nfs/statd/state: Permission denied
> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service:
> control process exited, code=exited status=1
> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status
> monitor for NFSv2/3 locking..
> May 05 13:43:37 node0.localdomain systemd[1]: Unit
> nfs-ganesha-lock.service entered failed state.
> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service
> failed.
>
> Thank you,
>
> Kind regards,
>
> Adam
>
> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at outlook.com
> <mailto:mahdi.adnan at outlook.com>> wrote:
>
> Hi,
>
>
> Same here, when i reboot the node i have to manually execute "pcs
> cluster start gluster01" and pcsd already enabled and started.
>
> Gluster 3.8.11
>
> Centos 7.3 latest
>
> Installed using CentOS Storage SIG repository
>
>
>
> --
>
> Respectfully*
> **Mahdi A. Mahdi*
>
> ------------------------------------------------------------------------
> *From:* gluster-users-bounces at gluster.org
> <mailto:gluster-users-bounces at gluster.org>
> <gluster-users-bounces at gluster.org
> <mailto:gluster-users-bounces at gluster.org>> on behalf of Adam Ru
> <ad.ruckel at gmail.com <mailto:ad.ruckel at gmail.com>>
> *Sent:* Wednesday, May 3, 2017 12:09:58 PM
> *To:* Soumya Koduri
> *Cc:* gluster-users at gluster.org <mailto:gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is
> down after reboot
>
> Hi Soumya,
>
> thank you very much for your reply.
>
> I enabled pcsd during setup and after reboot during troubleshooting
> I manually started it and checked resources (pcs status). They were
> not running. I didn’t find what was wrong but I’m going to try it again.
>
> I’ve thoroughly checked
> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
> and I can confirm that I followed all steps with one exception. I
> installed following RPMs:
> glusterfs-server
> glusterfs-fuse
> glusterfs-cli
> glusterfs-ganesha
> nfs-ganesha-xfs
>
> and the guide referenced above specifies:
> glusterfs-server
> glusterfs-api
> glusterfs-ganesha
>
> glusterfs-api is a dependency of one of RPMs that I installed so
> this is not a problem. But I cannot find any mention to install
> nfs-ganesha-xfs.
>
> I’ll try to setup the whole environment again without installing
> nfs-ganesha-xfs (I assume glusterfs-ganesha has all required binaries).
>
> Again, thank you for you time to answer my previous message.
>
> Kind regards,
> Adam
>
> On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at redhat.com
> <mailto:skoduri at redhat.com>> wrote:
>
> Hi,
>
> On 05/02/2017 01:34 AM, Rudolf wrote:
>
> Hi Gluster users,
>
> First, I'd like to thank you all for this amazing
> open-source! Thank you!
>
> I'm working on home project – three servers with Gluster and
> NFS-Ganesha. My goal is to create HA NFS share with three
> copies of each
> file on each server.
>
> My systems are CentOS 7.3 Minimal install with the latest
> updates and
> the most current RPMs from "centos-gluster310" repository.
>
> I followed this tutorial:
> http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
> <http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/>
> (second half that describes multi-node HA setup)
>
> with a few exceptions:
>
> 1. All RPMs are from "centos-gluster310" repo that is
> installed by "yum
> -y install centos-release-gluster"
> 2. I have three nodes (not four) with "replica 3" volume.
> 3. I created empty ganesha.conf and not empty ganesha-ha.conf in
> "/var/run/gluster/shared_storage/nfs-ganesha/" (referenced
> blog post is
> outdated, this is now requirement)
> 4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since this
> isn't needed
> anymore.
>
>
> Please refer to
> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>
> It is being updated with latest changes happened wrt setup.
>
> When I finish configuration, all is good.
> nfs-ganesha.service is active
> and running and from client I can ping all three VIPs and I
> can mount
> NFS. Copied files are replicated to all nodes.
>
> But when I restart nodes (one by one, with 5 min. delay
> between) then I
> cannot ping or mount (I assume that all VIPs are down). So
> my setup
> definitely isn't HA.
>
> I found that:
> # pcs status
> Error: cluster is not currently running on this node
>
>
> This means pcsd service is not up. Did you enable (systemctl
> enable pcsd) pcsd service so that is comes up post reboot
> automatically. If not please start it manually.
>
>
> and nfs-ganesha.service is in inactive state. Btw. I didn't
> enable
> "systemctl enable nfs-ganesha" since I assume that this is
> something
> that Gluster does.
>
>
> Please check /var/log/ganesha.log for any errors/warnings.
>
> We recommend not to enable nfs-ganesha.service (by default), as
> the shared storage (where the ganesha.conf file resides now)
> should be up and running before nfs-ganesha gets started.
> So if enabled by default it could happen that shared_storage
> mount point is not yet up and it resulted in nfs-ganesha service
> failure. If you would like to address this, you could have a
> cron job which keeps checking the mount point health and then
> start nfs-ganesha service.
>
> Thanks,
> Soumya
>
>
> I assume that my issue is that I followed instructions in
> blog post from
> 2015/10 that are outdated. Unfortunately I cannot find
> anything better –
> I spent whole day by googling.
>
> Would you be so kind and check the instructions in blog post
> and let me
> know what steps are wrong / outdated? Or please do you have
> more current
> instructions for Gluster+Ganesha setup?
>
> Thank you.
>
> Kind regards,
> Adam
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
> --
> Adam
>
>
>
>
> --
> Adam
More information about the Gluster-users
mailing list