[Gluster-users] NFS not start on localhost

Mon Oct 20 07:41:08 UTC 2014

On Mon, Oct 20, 2014 at 09:04:28AM +0200, Demeter Tibor wrote:
> Hi,
> 
> This is the full nfs.log after delete & reboot.
> It is refers to portmap registering problem.
> 
> [root at node0 glusterfs]# cat nfs.log
> [2014-10-20 06:48:43.221136] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.2 (/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/567e0bba7ad7102eae3049e2ad6c3ed7.socket)
> [2014-10-20 06:48:43.224444] I [socket.c:3561:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled
> [2014-10-20 06:48:43.224475] I [socket.c:3576:socket_init] 0-socket.glusterfsd: using system polling thread
> [2014-10-20 06:48:43.224654] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
> [2014-10-20 06:48:43.224667] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
> [2014-10-20 06:48:43.235876] I [rpcsvc.c:2127:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 16
> [2014-10-20 06:48:43.254087] I [socket.c:3561:socket_init] 0-socket.nfs-server: SSL support is NOT enabled
> [2014-10-20 06:48:43.254116] I [socket.c:3576:socket_init] 0-socket.nfs-server: using system polling thread
> [2014-10-20 06:48:43.255241] I [socket.c:3561:socket_init] 0-socket.nfs-server: SSL support is NOT enabled
> [2014-10-20 06:48:43.255264] I [socket.c:3576:socket_init] 0-socket.nfs-server: using system polling thread
> [2014-10-20 06:48:43.257279] I [socket.c:3561:socket_init] 0-socket.nfs-server: SSL support is NOT enabled
> [2014-10-20 06:48:43.257315] I [socket.c:3576:socket_init] 0-socket.nfs-server: using system polling thread
> [2014-10-20 06:48:43.258135] I [socket.c:3561:socket_init] 0-socket.NLM: SSL support is NOT enabled
> [2014-10-20 06:48:43.258157] I [socket.c:3576:socket_init] 0-socket.NLM: using system polling thread
> [2014-10-20 06:48:43.293724] E [rpcsvc.c:1314:rpcsvc_program_register_portmap] 0-rpc-service: Could not register with portmap
> [2014-10-20 06:48:43.293760] E [nfs.c:332:nfs_init_versions] 0-nfs: Program  NLM4 registration failed

The above line suggests that there already is a service registered at
portmapper for the NLM4 program/service. This happens when the kernel
module 'lockd' is loaded. The kernel NFS-client and NFS-server depend on
this, but unfortunately it conflicts with the Gluster/nfs server.

Could you verify that the module is loaded?
 - use 'lsmod | grep lockd' to check the modules
 - use 'rpcinfo | grep nlockmgr' to check the rpcbind registrations

Make sure that you do not mount any NFS exports on the Gluster server.
Unmount all NFS mounts.

You mentioned you are running CentOS-7, which is systemd based. You
should be able to stop any conflicting NFS services like this:

 # systemctl stop nfs-lock.service
 # systemctl stop nfs.target
 # systemctl disable nfs.target

If all these services cleanup themselves, you should be able to start
the Gluster/nfs service:

  # systemctl restart glusterd.service

In case some bits are still lingering around, it might be easier to
reboot after disabling the 'nfs.target'.

> [2014-10-20 06:48:43.293771] E [nfs.c:1312:init] 0-nfs: Failed to initialize protocols
> [2014-10-20 06:48:43.293777] E [xlator.c:403:xlator_init] 0-nfs-server: Initialization of volume 'nfs-server' failed, review your volfile again
> [2014-10-20 06:48:43.293783] E [graph.c:307:glusterfs_graph_init] 0-nfs-server: initializing translator failed
> [2014-10-20 06:48:43.293789] E [graph.c:502:glusterfs_graph_activate] 0-graph: init failed
> pending frames:
> frame : type(0) op(0)
> 
> patchset: git://git.gluster.com/glusterfs.git
> signal received: 11
> time of crash: 2014-10-20 06:48:43configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.5.2
> [root at node0 glusterfs]# systemctl status portma
> portma.service
>    Loaded: not-found (Reason: No such file or directory)
>    Active: inactive (dead)
> 
> 
> 
> Also I have checked the rpcbind service.
> 
> [root at node0 glusterfs]# systemctl status rpcbind.service 
> rpcbind.service - RPC bind service
>    Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled)
>    Active: active (running) since h 2014-10-20 08:48:39 CEST; 2min 52s ago
>   Process: 1940 ExecStart=/sbin/rpcbind -w ${RPCBIND_ARGS} (code=exited, status=0/SUCCESS)
>  Main PID: 1946 (rpcbind)
>    CGroup: /system.slice/rpcbind.service
>            └─1946 /sbin/rpcbind -w
> 
> okt 20 08:48:39 node0.itsmart.cloud systemd[1]: Starting RPC bind service...
> okt 20 08:48:39 node0.itsmart.cloud systemd[1]: Started RPC bind service.
> 
> The restart does not solve this problem.
> 
> 
> I think this is the problem. Why are "exited" the portmap status?

The 'portmap' service has been replaced with 'rpcbind' since RHEL-6.
They have the same functionality, 'rpcbind' just happens to be the newer
version.

Did you file a bug for this already? As Vijay mentions, this crash seems
to happen because the Gluster/nfs service fails to initialize correctly
and then fails to cleanup correctly. The cleanup should get fixed, and
we should also give an easier to understand error message.

Thanks,
Niels

> 
> 
> On node1 is ok:
> 
> [root at node1 ~]# systemctl status rpcbind.service 
> rpcbind.service - RPC bind service
>    Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled)
>    Active: active (running) since p 2014-10-17 19:15:21 CEST; 2 days ago
>  Main PID: 1963 (rpcbind)
>    CGroup: /system.slice/rpcbind.service
>            └─1963 /sbin/rpcbind -w
> 
> okt 17 19:15:21 node1.itsmart.cloud systemd[1]: Starting RPC bind service...
> okt 17 19:15:21 node1.itsmart.cloud systemd[1]: Started RPC bind service.
> 
> 
> 
> Thanks in advance
> 
> Tibor
> 
> 
> 
> ----- Eredeti üzenet -----
> > On 10/19/2014 06:56 PM, Niels de Vos wrote:
> > > On Sat, Oct 18, 2014 at 01:24:12PM +0200, Demeter Tibor wrote:
> > >> Hi,
> > >>
> > >> [root at node0 ~]# tail -n 20 /var/log/glusterfs/nfs.log
> > >> [2014-10-18 07:41:06.136035] E [graph.c:307:glusterfs_graph_init]
> > >> 0-nfs-server: initializing translator failed
> > >> [2014-10-18 07:41:06.136040] E [graph.c:502:glusterfs_graph_activate]
> > >> 0-graph: init failed
> > >> pending frames:
> > >> frame : type(0) op(0)
> > >>
> > >> patchset: git://git.gluster.com/glusterfs.git
> > >> signal received: 11
> > >> time of crash: 2014-10-18 07:41:06configuration details:
> > >> argp 1
> > >> backtrace 1
> > >> dlfcn 1
> > >> fdatasync 1
> > >> libpthread 1
> > >> llistxattr 1
> > >> setfsid 1
> > >> spinlock 1
> > >> epoll.h 1
> > >> xattr.h 1
> > >> st_atim.tv_nsec 1
> > >> package-string: glusterfs 3.5.2
> > >
> > > This definitely is a gluster/nfs issue. For whatever reasone, the
> > > gluster/nfs server crashes :-/ The log does not show enough details,
> > > some more lines before this are needed.
> > >
> > 
> > I wonder if the crash is due to a cleanup after the translator
> > initialization failure. The complete logs might help in understanding
> > why the initialization failed.
> > 
> > -Vijay
> > 
> >