[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot
Adam Ru
ad.ruckel at gmail.com
Sun May 28 13:37:11 UTC 2017
Hi Soumya,
again I apologize for delay in response. I'll try to file a bug.
Meantime I'm sending AVCs and version number. AVC are collected
between two reboots, in both cases I manually started
nfs-ganesha.service and nfs-ganesha-lock.service failed to start.
uname -r
3.10.0-514.21.1.el7.x86_64
sestatus -v
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28
Process contexts:
Current context:
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Init context: system_u:system_r:init_t:s0
File contexts:
Controlling terminal: unconfined_u:object_r:user_tty_device_t:s0
/etc/passwd system_u:object_r:passwd_file_t:s0
/etc/shadow system_u:object_r:shadow_t:s0
/bin/bash system_u:object_r:shell_exec_t:s0
/bin/login system_u:object_r:login_exec_t:s0
/bin/sh system_u:object_r:bin_t:s0 ->
system_u:object_r:shell_exec_t:s0
/sbin/agetty system_u:object_r:getty_exec_t:s0
/sbin/init system_u:object_r:bin_t:s0 ->
system_u:object_r:init_exec_t:s0
/usr/sbin/sshd system_u:object_r:sshd_exec_t:s0
sudo systemctl start nfs-ganesha.service
systemctl status -l nfs-ganesha-lock.service
● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
static; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2017-05-28 14:12:48 UTC; 9s ago
Process: 1991 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
(code=exited, status=1/FAILURE)
mynode0.localdomain systemd[1]: Starting NFS status monitor for
NFSv2/3 locking....
mynode0.localdomain rpc.statd[1992]: Version 1.3.0 starting
mynode0.localdomain rpc.statd[1992]: Flags: TI-RPC
mynode0.localdomain rpc.statd[1992]: Failed to open directory sm:
Permission denied
mynode0.localdomain systemd[1]: nfs-ganesha-lock.service: control
process exited, code=exited status=1
mynode0.localdomain systemd[1]: Failed to start NFS status monitor for
NFSv2/3 locking..
mynode0.localdomain systemd[1]: Unit nfs-ganesha-lock.service entered
failed state.
mynode0.localdomain systemd[1]: nfs-ganesha-lock.service failed.
sudo ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -i
----
type=SYSCALL msg=audit(05/28/2017 14:04:32.160:25) : arch=x86_64
syscall=bind success=yes exit=0 a0=0xf a1=0x7ffc757feb60 a2=0x10
a3=0x22 items=0 ppid=1149 pid=1157 auid=unset uid=root gid=root
euid=root suid=root fsuid=root egid=root sgid=root fsgid=root
tty=(none) ses=unset comm=glusterd exe=/usr/sbin/glusterfsd
subj=system_u:system_r:glusterd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:04:32.160:25) : avc: denied {
name_bind } for pid=1157 comm=glusterd src=61000
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket
----
type=SYSCALL msg=audit(05/28/2017 14:11:16.141:26) : arch=x86_64
syscall=bind success=no exit=EACCES(Permission denied) a0=0xf
a1=0x7ffffbf92620 a2=0x10 a3=0x22 items=0 ppid=1139 pid=1146
auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
sgid=root fsgid=root tty=(none) ses=unset comm=glusterd
exe=/usr/sbin/glusterfsd subj=system_u:system_r:glusterd_t:s0
key=(null)
type=AVC msg=audit(05/28/2017 14:11:16.141:26) : avc: denied {
name_bind } for pid=1146 comm=glusterd src=61000
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket
----
type=SYSCALL msg=audit(05/28/2017 14:12:48.068:75) : arch=x86_64
syscall=openat success=no exit=EACCES(Permission denied)
a0=0xffffffffffffff9c a1=0x7efdc1ec3e10
a2=O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC a3=0x0 items=0 ppid=1991
pid=1992 auid=unset uid=root gid=root euid=root suid=root fsuid=root
egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.statd
exe=/usr/sbin/rpc.statd subj=system_u:system_r:rpcd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:12:48.068:75) : avc: denied { read
} for pid=1992 comm=rpc.statd name=sm dev="fuse"
ino=12866274077597183313 scontext=system_u:system_r:rpcd_t:s0
tcontext=system_u:object_r:fusefs_t:s0 tclass=dir
----
type=SYSCALL msg=audit(05/28/2017 14:12:48.080:76) : arch=x86_64
syscall=open success=no exit=EACCES(Permission denied)
a0=0x7efdc1ec3dd0 a1=O_RDONLY a2=0x7efdc1ec3de8 a3=0x5 items=0
ppid=1991 pid=1992 auid=unset uid=root gid=root euid=root suid=root
fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset
comm=rpc.statd exe=/usr/sbin/rpc.statd
subj=system_u:system_r:rpcd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:12:48.080:76) : avc: denied { read
} for pid=1992 comm=rpc.statd name=state dev="fuse"
ino=12362789396445498341 scontext=system_u:system_r:rpcd_t:s0
tcontext=system_u:object_r:fusefs_t:s0 tclass=file
----
type=SYSCALL msg=audit(05/28/2017 14:17:37.177:26) : arch=x86_64
syscall=bind success=no exit=EACCES(Permission denied) a0=0xf
a1=0x7ffdfa768c70 a2=0x10 a3=0x22 items=0 ppid=1155 pid=1162
auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
sgid=root fsgid=root tty=(none) ses=unset comm=glusterd
exe=/usr/sbin/glusterfsd subj=system_u:system_r:glusterd_t:s0
key=(null)
type=AVC msg=audit(05/28/2017 14:17:37.177:26) : avc: denied {
name_bind } for pid=1162 comm=glusterd src=61000
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket
----
type=SYSCALL msg=audit(05/28/2017 14:17:46.401:56) : arch=x86_64
syscall=kill success=no exit=EACCES(Permission denied) a0=0x560
a1=SIGKILL a2=0x7fd684000078 a3=0x0 items=0 ppid=1 pid=1167 auid=unset
uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root
fsgid=root tty=(none) ses=unset comm=glusterd exe=/usr/sbin/glusterfsd
subj=system_u:system_r:glusterd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:17:46.401:56) : avc: denied {
sigkill } for pid=1167 comm=glusterd
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:system_r:cluster_t:s0 tclass=process
----
type=SYSCALL msg=audit(05/28/2017 14:17:45.400:55) : arch=x86_64
syscall=kill success=no exit=EACCES(Permission denied) a0=0x560
a1=SIGTERM a2=0x7fd684000038 a3=0x99 items=0 ppid=1 pid=1167
auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
sgid=root fsgid=root tty=(none) ses=unset comm=glusterd
exe=/usr/sbin/glusterfsd subj=system_u:system_r:glusterd_t:s0
key=(null)
type=AVC msg=audit(05/28/2017 14:17:45.400:55) : avc: denied {
signal } for pid=1167 comm=glusterd
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:system_r:cluster_t:s0 tclass=process
----
type=SYSCALL msg=audit(05/28/2017 14:18:56.024:67) : arch=x86_64
syscall=openat success=no exit=EACCES(Permission denied)
a0=0xffffffffffffff9c a1=0x7ff662e9be10
a2=O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC a3=0x0 items=0 ppid=1949
pid=1950 auid=unset uid=root gid=root euid=root suid=root fsuid=root
egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.statd
exe=/usr/sbin/rpc.statd subj=system_u:system_r:rpcd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:18:56.024:67) : avc: denied { read
} for pid=1950 comm=rpc.statd name=sm dev="fuse"
ino=12866274077597183313 scontext=system_u:system_r:rpcd_t:s0
tcontext=system_u:object_r:fusefs_t:s0 tclass=dir
----
type=SYSCALL msg=audit(05/28/2017 14:18:56.034:68) : arch=x86_64
syscall=open success=no exit=EACCES(Permission denied)
a0=0x7ff662e9bdd0 a1=O_RDONLY a2=0x7ff662e9bde8 a3=0x5 items=0
ppid=1949 pid=1950 auid=unset uid=root gid=root euid=root suid=root
fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset
comm=rpc.statd exe=/usr/sbin/rpc.statd
subj=system_u:system_r:rpcd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:18:56.034:68) : avc: denied { read
} for pid=1950 comm=rpc.statd name=state dev="fuse"
ino=12362789396445498341 scontext=system_u:system_r:rpcd_t:s0
tcontext=system_u:object_r:fusefs_t:s0 tclass=file
On Mon, May 15, 2017 at 11:56 AM, Soumya Koduri <skoduri at redhat.com> wrote:
>
>
> On 05/12/2017 06:27 PM, Adam Ru wrote:
>>
>> Hi Soumya,
>>
>> Thank you very much for last response – very useful.
>>
>> I apologize for delay, I had to find time for another testing.
>>
>> I updated instructions that I provided in previous e-mail. *** means
>> that the step was added.
>>
>> Instructions:
>> - Clean installation of CentOS 7.3 with all updates, 3x node,
>> resolvable IPs and VIPs
>> - Stopped firewalld (just for testing)
>> - *** SELinux in permissive mode (I had to, will explain bellow)
>> - Install "centos-release-gluster" to get "centos-gluster310" repo
>> and install following (nothing else):
>> --- glusterfs-server
>> --- glusterfs-ganesha
>> - Passwordless SSH between all nodes
>> (/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes)
>> - systemctl enable and start glusterd
>> - gluster peer probe <other nodes>
>> - gluster volume set all cluster.enable-shared-storage enable
>> - systemctl enable and start pcsd.service
>> - systemctl enable pacemaker.service (cannot be started at this moment)
>> - Set password for hacluster user on all nodes
>> - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla
>> - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>> - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
>> sure if needed)
>> - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
>> insert configuration
>> - Try list files on other nodes: ls
>> /var/run/gluster/shared_storage/nfs-ganesha/
>> - gluster nfs-ganesha enable
>> - *** systemctl enable pacemaker.service (again, since pacemaker was
>> disabled at this point)
>> - *** Check owner of "state", "statd", "sm" and "sm.bak" in
>> /var/lib/nfs/ (I had to: chown rpcuser:rpcuser
>> /var/lib/nfs/statd/state)
>> - Check on other nodes that nfs-ganesha.service is running and "pcs
>> status" shows started resources
>> - gluster volume create mynewshare replica 3 transport tcp
>> node1:/<dir> node2:/<dir> node3:/<dir>
>> - gluster volume start mynewshare
>> - gluster vol set mynewshare ganesha.enable on
>>
>> At this moment, this is status of important (I think) services:
>>
>> -- corosync.service disabled
>> -- corosync-notifyd.service disabled
>> -- glusterd.service enabled
>> -- glusterfsd.service disabled
>> -- pacemaker.service enabled
>> -- pcsd.service enabled
>> -- nfs-ganesha.service disabled
>> -- nfs-ganesha-config.service static
>> -- nfs-ganesha-lock.service static
>>
>> -- corosync.service active (running)
>> -- corosync-notifyd.service inactive (dead)
>> -- glusterd.service active (running)
>> -- glusterfsd.service inactive (dead)
>> -- pacemaker.service active (running)
>> -- pcsd.service active (running)
>> -- nfs-ganesha.service active (running)
>> -- nfs-ganesha-config.service inactive (dead)
>> -- nfs-ganesha-lock.service active (running)
>>
>> May I ask you a few questions please?
>>
>> 1. Could you please confirm that services above has correct status/state?
>
>
> Looks good to the best of my knowledge.
>
>>
>> 2. When I restart a node then nfs-ganesha is not running. Of course I
>> cannot enable it since it needs to be enabled after shared storage is
>> mounted. What is best practice to start it automatically so I don’t
>> have to worry about restarting node? Should I create a script that
>> will check whether shared storage was mounted and then start
>> nfs-ganesha? How do you do this in production?
>
>
> That's right.. We have plans to address this in near future (probably by
> having a new .service which mounts shared_storage before starting
> nfs-ganesha). But until then ..yes having a custom defined script to do so
> is the only way to automate it.
>
>
>>
>> 3. SELinux is an issue, is that a known bug?
>>
>> When I restart a node and start nfs-ganesha.service with SELinux in
>> permissive mode:
>>
>> sudo grep 'statd' /var/log/messages
>> May 12 12:05:46 mynode1 rpc.statd[2415]: Version 1.3.0 starting
>> May 12 12:05:46 mynode1 rpc.statd[2415]: Flags: TI-RPC
>> May 12 12:05:46 mynode1 rpc.statd[2415]: Failed to read
>> /var/lib/nfs/statd/state: Success
>> May 12 12:05:46 mynode1 rpc.statd[2415]: Initializing NSM state
>> May 12 12:05:52 mynode1 rpc.statd[2415]: Received SM_UNMON_ALL request
>> from mynode1.localdomain while not monitoring any hosts
>>
>> systemctl status nfs-ganesha-lock.service --full
>> ● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>> Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
>> static; vendor preset: disabled)
>> Active: active (running) since Fri 2017-05-12 12:05:46 UTC; 1min 43s
>> ago
>> Process: 2414 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
>> (code=exited, status=0/SUCCESS)
>> Main PID: 2415 (rpc.statd)
>> CGroup: /system.slice/nfs-ganesha-lock.service
>> └─2415 /usr/sbin/rpc.statd --no-notify
>>
>> May 12 12:05:46 mynode1.localdomain systemd[1]: Starting NFS status
>> monitor for NFSv2/3 locking....
>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Version 1.3.0
>> starting
>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Flags: TI-RPC
>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Failed to read
>> /var/lib/nfs/statd/state: Success
>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Initializing NSM
>> state
>> May 12 12:05:46 mynode1.localdomain systemd[1]: Started NFS status
>> monitor for NFSv2/3 locking..
>> May 12 12:05:52 mynode1.localdomain rpc.statd[2415]: Received
>> SM_UNMON_ALL request from mynode1.localdomain while not monitoring any
>> hosts
>>
>>
>> When I restart a node and start nfs-ganesha.service with SELinux in
>> enforcing mode:
>>
>>
>> sudo grep 'statd' /var/log/messages
>> May 12 12:14:01 mynode1 rpc.statd[1743]: Version 1.3.0 starting
>> May 12 12:14:01 mynode1 rpc.statd[1743]: Flags: TI-RPC
>> May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open directory sm:
>> Permission denied
>> May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open
>> /var/lib/nfs/statd/state: Permission denied
>>
>> systemctl status nfs-ganesha-lock.service --full
>> ● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>> Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
>> static; vendor preset: disabled)
>> Active: failed (Result: exit-code) since Fri 2017-05-12 12:14:01
>> UTC; 1min 21s ago
>> Process: 1742 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
>> (code=exited, status=1/FAILURE)
>>
>> May 12 12:14:01 mynode1.localdomain systemd[1]: Starting NFS status
>> monitor for NFSv2/3 locking....
>> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Version 1.3.0
>> starting
>> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Flags: TI-RPC
>> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Failed to open
>> directory sm: Permission denied
>> May 12 12:14:01 mynode1.localdomain systemd[1]:
>> nfs-ganesha-lock.service: control process exited, code=exited status=1
>> May 12 12:14:01 mynode1.localdomain systemd[1]: Failed to start NFS
>> status monitor for NFSv2/3 locking..
>> May 12 12:14:01 mynode1.localdomain systemd[1]: Unit
>> nfs-ganesha-lock.service entered failed state.
>> May 12 12:14:01 mynode1.localdomain systemd[1]: nfs-ganesha-lock.service
>> failed.
>
>
> Cant remember right now. Could you please paste the AVCs you get, and
> se-linux packages version. Or preferably please file a bug. We can get the
> details verified from selinux members.
>
> Thanks,
> Soumya
>
>
>>
>> On Fri, May 5, 2017 at 8:10 PM, Soumya Koduri <skoduri at redhat.com> wrote:
>>>
>>>
>>>
>>> On 05/05/2017 08:04 PM, Adam Ru wrote:
>>>>
>>>>
>>>> Hi Soumya,
>>>>
>>>> Thank you for the answer.
>>>>
>>>> Enabling Pacemaker? Yes, you’re completely right, I didn’t do it. Thank
>>>> you.
>>>>
>>>> I spent some time by testing and I have some results. This is what I
>>>> did:
>>>>
>>>> - Clean installation of CentOS 7.3 with all updates, 3x node,
>>>> resolvable IPs and VIPs
>>>> - Stopped firewalld (just for testing)
>>>> - Install "centos-release-gluster" to get "centos-gluster310" repo and
>>>> install following (nothing else):
>>>> --- glusterfs-server
>>>> --- glusterfs-ganesha
>>>> - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem
>>>> and secret.pem.pub on all nodes)
>>>> - systemctl enable and start glusterd
>>>> - gluster peer probe <other nodes>
>>>> - gluster volume set all cluster.enable-shared-storage enable
>>>> - systemctl enable and start pcsd.service
>>>> - systemctl enable pacemaker.service (cannot be started at this moment)
>>>> - Set password for hacluster user on all nodes
>>>> - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla
>>>> - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>>>> - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
>>>> sure if needed)
>>>> - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
>>>> insert configuration
>>>> - Try list files on other nodes: ls
>>>> /var/run/gluster/shared_storage/nfs-ganesha/
>>>> - gluster nfs-ganesha enable
>>>> - Check on other nodes that nfs-ganesha.service is running and "pcs
>>>> status" shows started resources
>>>> - gluster volume create mynewshare replica 3 transport tcp node1:/<dir>
>>>> node2:/<dir> node3:/<dir>
>>>> - gluster volume start mynewshare
>>>> - gluster vol set mynewshare ganesha.enable on
>>>>
>>>> After these steps, all VIPs are pingable and I can mount
>>>> node1:/mynewshare
>>>>
>>>> Funny thing is that pacemaker.service is disabled again (something
>>>> disabled it). This is status of important (I think) services:
>>>
>>>
>>>
>>> yeah. We too had observed this recently. We guess probably pcs cluster
>>> setup
>>> command first destroys existing cluster (if any) which may be disabling
>>> pacemaker too.
>>>
>>>>
>>>> systemctl list-units --all
>>>> # corosync.service loaded active running
>>>> # glusterd.service loaded active running
>>>> # nfs-config.service loaded inactive dead
>>>> # nfs-ganesha-config.service loaded inactive dead
>>>> # nfs-ganesha-lock.service loaded active running
>>>> # nfs-ganesha.service loaded active running
>>>> # nfs-idmapd.service loaded inactive dead
>>>> # nfs-mountd.service loaded inactive dead
>>>> # nfs-server.service loaded inactive dead
>>>> # nfs-utils.service loaded inactive dead
>>>> # pacemaker.service loaded active running
>>>> # pcsd.service loaded active running
>>>>
>>>> systemctl list-unit-files --all
>>>> # corosync-notifyd.service disabled
>>>> # corosync.service disabled
>>>> # glusterd.service enabled
>>>> # glusterfsd.service disabled
>>>> # nfs-blkmap.service disabled
>>>> # nfs-config.service static
>>>> # nfs-ganesha-config.service static
>>>> # nfs-ganesha-lock.service static
>>>> # nfs-ganesha.service disabled
>>>> # nfs-idmap.service static
>>>> # nfs-idmapd.service static
>>>> # nfs-lock.service static
>>>> # nfs-mountd.service static
>>>> # nfs-rquotad.service disabled
>>>> # nfs-secure-server.service static
>>>> # nfs-secure.service static
>>>> # nfs-server.service disabled
>>>> # nfs-utils.service static
>>>> # nfs.service disabled
>>>> # nfslock.service static
>>>> # pacemaker.service disabled
>>>> # pcsd.service enabled
>>>>
>>>> I enabled pacemaker again on all nodes and restart all nodes one by one.
>>>>
>>>> After reboot all VIPs are gone and I can see that nfs-ganesha.service
>>>> isn’t running. When I start it on at least two nodes then VIPs are
>>>> pingable again and I can mount NFS again. But there is still some issue
>>>> in the setup because when I check nfs-ganesha-lock.service I get:
>>>>
>>>> systemctl -l status nfs-ganesha-lock.service
>>>> ● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>>>> Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
>>>> static; vendor preset: disabled)
>>>> Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC;
>>>> 31min ago
>>>> Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
>>>> (code=exited, status=1/FAILURE)
>>>>
>>>> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status
>>>> monitor for NFSv2/3 locking....
>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0
>>>> starting
>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC
>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
>>>> directory sm: Permission denied
>>>
>>>
>>>
>>> Okay this issue was fixed and the fix should be present in 3.10 too -
>>> https://review.gluster.org/#/c/16433/
>>>
>>> Please check '/var/log/messages' for statd related errors and cross-check
>>> permissions of that directory. You could manually chown owner:group of
>>> /var/lib/nfs/statd/sm directory for now and then restart nfs-ganesha*
>>> services.
>>>
>>> Thanks,
>>> Soumya
>>>
>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
>>>> /var/lib/nfs/statd/state: Permission denied
>>>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service:
>>>> control process exited, code=exited status=1
>>>> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status
>>>> monitor for NFSv2/3 locking..
>>>> May 05 13:43:37 node0.localdomain systemd[1]: Unit
>>>> nfs-ganesha-lock.service entered failed state.
>>>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service
>>>> failed.
>>>>
>>>> Thank you,
>>>>
>>>> Kind regards,
>>>>
>>>> Adam
>>>>
>>>> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at outlook.com
>>>> <mailto:mahdi.adnan at outlook.com>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> Same here, when i reboot the node i have to manually execute "pcs
>>>> cluster start gluster01" and pcsd already enabled and started.
>>>>
>>>> Gluster 3.8.11
>>>>
>>>> Centos 7.3 latest
>>>>
>>>> Installed using CentOS Storage SIG repository
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully*
>>>> **Mahdi A. Mahdi*
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* gluster-users-bounces at gluster.org
>>>> <mailto:gluster-users-bounces at gluster.org>
>>>> <gluster-users-bounces at gluster.org
>>>> <mailto:gluster-users-bounces at gluster.org>> on behalf of Adam Ru
>>>> <ad.ruckel at gmail.com <mailto:ad.ruckel at gmail.com>>
>>>> *Sent:* Wednesday, May 3, 2017 12:09:58 PM
>>>> *To:* Soumya Koduri
>>>> *Cc:* gluster-users at gluster.org <mailto:gluster-users at gluster.org>
>>>> *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is
>>>>
>>>> down after reboot
>>>>
>>>> Hi Soumya,
>>>>
>>>> thank you very much for your reply.
>>>>
>>>> I enabled pcsd during setup and after reboot during troubleshooting
>>>> I manually started it and checked resources (pcs status). They were
>>>> not running. I didn’t find what was wrong but I’m going to try it
>>>> again.
>>>>
>>>> I’ve thoroughly checked
>>>>
>>>>
>>>> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>>>
>>>>
>>>> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>>> and I can confirm that I followed all steps with one exception. I
>>>> installed following RPMs:
>>>> glusterfs-server
>>>> glusterfs-fuse
>>>> glusterfs-cli
>>>> glusterfs-ganesha
>>>> nfs-ganesha-xfs
>>>>
>>>> and the guide referenced above specifies:
>>>> glusterfs-server
>>>> glusterfs-api
>>>> glusterfs-ganesha
>>>>
>>>> glusterfs-api is a dependency of one of RPMs that I installed so
>>>> this is not a problem. But I cannot find any mention to install
>>>> nfs-ganesha-xfs.
>>>>
>>>> I’ll try to setup the whole environment again without installing
>>>> nfs-ganesha-xfs (I assume glusterfs-ganesha has all required
>>>> binaries).
>>>>
>>>> Again, thank you for you time to answer my previous message.
>>>>
>>>> Kind regards,
>>>> Adam
>>>>
>>>> On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at redhat.com
>>>> <mailto:skoduri at redhat.com>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 05/02/2017 01:34 AM, Rudolf wrote:
>>>>
>>>> Hi Gluster users,
>>>>
>>>> First, I'd like to thank you all for this amazing
>>>> open-source! Thank you!
>>>>
>>>> I'm working on home project – three servers with Gluster and
>>>> NFS-Ganesha. My goal is to create HA NFS share with three
>>>> copies of each
>>>> file on each server.
>>>>
>>>> My systems are CentOS 7.3 Minimal install with the latest
>>>> updates and
>>>> the most current RPMs from "centos-gluster310" repository.
>>>>
>>>> I followed this tutorial:
>>>>
>>>>
>>>> http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
>>>>
>>>>
>>>> <http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/>
>>>> (second half that describes multi-node HA setup)
>>>>
>>>> with a few exceptions:
>>>>
>>>> 1. All RPMs are from "centos-gluster310" repo that is
>>>> installed by "yum
>>>> -y install centos-release-gluster"
>>>> 2. I have three nodes (not four) with "replica 3" volume.
>>>> 3. I created empty ganesha.conf and not empty
>>>> ganesha-ha.conf
>>>> in
>>>> "/var/run/gluster/shared_storage/nfs-ganesha/" (referenced
>>>> blog post is
>>>> outdated, this is now requirement)
>>>> 4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since this
>>>> isn't needed
>>>> anymore.
>>>>
>>>>
>>>> Please refer to
>>>>
>>>>
>>>> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>>>
>>>>
>>>> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>>>
>>>> It is being updated with latest changes happened wrt setup.
>>>>
>>>> When I finish configuration, all is good.
>>>> nfs-ganesha.service is active
>>>> and running and from client I can ping all three VIPs and I
>>>> can mount
>>>> NFS. Copied files are replicated to all nodes.
>>>>
>>>> But when I restart nodes (one by one, with 5 min. delay
>>>> between) then I
>>>> cannot ping or mount (I assume that all VIPs are down). So
>>>> my setup
>>>> definitely isn't HA.
>>>>
>>>> I found that:
>>>> # pcs status
>>>> Error: cluster is not currently running on this node
>>>>
>>>>
>>>> This means pcsd service is not up. Did you enable (systemctl
>>>> enable pcsd) pcsd service so that is comes up post reboot
>>>> automatically. If not please start it manually.
>>>>
>>>>
>>>> and nfs-ganesha.service is in inactive state. Btw. I didn't
>>>> enable
>>>> "systemctl enable nfs-ganesha" since I assume that this is
>>>> something
>>>> that Gluster does.
>>>>
>>>>
>>>> Please check /var/log/ganesha.log for any errors/warnings.
>>>>
>>>> We recommend not to enable nfs-ganesha.service (by default), as
>>>> the shared storage (where the ganesha.conf file resides now)
>>>> should be up and running before nfs-ganesha gets started.
>>>> So if enabled by default it could happen that shared_storage
>>>> mount point is not yet up and it resulted in nfs-ganesha service
>>>> failure. If you would like to address this, you could have a
>>>> cron job which keeps checking the mount point health and then
>>>> start nfs-ganesha service.
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>>
>>>> I assume that my issue is that I followed instructions in
>>>> blog post from
>>>> 2015/10 that are outdated. Unfortunately I cannot find
>>>> anything better –
>>>> I spent whole day by googling.
>>>>
>>>> Would you be so kind and check the instructions in blog post
>>>> and let me
>>>> know what steps are wrong / outdated? Or please do you have
>>>> more current
>>>> instructions for Gluster+Ganesha setup?
>>>>
>>>> Thank you.
>>>>
>>>> Kind regards,
>>>> Adam
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Adam
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Adam
>>
>>
>>
>>
>
--
Adam
More information about the Gluster-users
mailing list