[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Alessandro De Salvo
Alessandro.DeSalvo at roma1.infn.it
Mon Jun 15 17:09:30 UTC 2015
OK, thanks, so, any hint on what I could check now?
I have tried even without any VFS, so with just the nfs-ganesha rpm installed and with an empty ganesha.conf, but still the same problem. The same configuration with ganesha 2.1.0 was working, on the same server.
Any idea? I have sent you the logs but please tell me if you need more.
Thanks,
Alessandro
> Il giorno 15/giu/2015, alle ore 18:47, Malahal Naineni <malahal at us.ibm.com> ha scritto:
>
> We do run ganesha on RHEL7.0 (same as CentOS7.0), and I don't think 7.1
> would be much different. We do run GPFS FSAL only (no VFS_FSAL).
>
> Regards, Malahal.
>
> Alessandro De Salvo [Alessandro.DeSalvo at roma1.infn.it] wrote:
>> Hi,
>> any news on this? Did you have the chance to look into that?
>> I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1
>> and if it was really working, as I also tried on a standalone, clean
>> machine, and I see the very same behavior, even without gluster.
>> Thanks,
>>
>> Alessandro
>>
>> On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo wrote:
>>> Hi,
>>> looking at the code and having recompiled adding some more debug, I
>>> might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c,
>>> fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe->flags &
>>> Wqe_LFlag_SyncDone)) and never exit from there.
>>> I do not know if it's normal or not as I should read better the code.
>>> Cheers,
>>>
>>> Alessandro
>>>
>>> On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote:
>>>> Hi Malahal,
>>>>
>>>>
>>>>> Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni <malahal at us.ibm.com> ha scritto:
>>>>>
>>>>> The logs indicate that ganesha was started successfully without any
>>>>> exports. gstack output seemed normal as well -- threads were waiting to
>>>>> serve requests.
>>>>
>>>> Yes, no exports as it was the default config before enabling Ganesha on any gluster volume.
>>>>
>>>>>
>>>>> Assuming that you are running "showmount -e" on the same system, there
>>>>> shouldn't be any firewall coming into the picture.
>>>>
>>>> Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways.
>>>>
>>>>> If you are running
>>>>> "showmount" from some other system, make sure there is no firewall
>>>>> dropping the packets.
>>>>>
>>>>> I think you need tcpdump trace to figure out the problem. My wireshark
>>>>> trace showed two requests from the client to complete the "showmount -e"
>>>>> command:
>>>>>
>>>>> 1. Client sent "GETPORT" call to port 111 (rpcbind) to get the port number
>>>>> of MOUNT.
>>>>> 2. Then it sent "EXPORT" call to mountd port (port it got in response to #1).
>>>>
>>>> Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request.
>>>> Also "rpcinfo -t localhost portmapper" returns successfully, while "rpcinfo -t localhost nfs" hangs.
>>>> The output of rpcinfo -p is the following:
>>>>
>>>> program vers proto port service
>>>> 100000 4 tcp 111 portmapper
>>>> 100000 3 tcp 111 portmapper
>>>> 100000 2 tcp 111 portmapper
>>>> 100000 4 udp 111 portmapper
>>>> 100000 3 udp 111 portmapper
>>>> 100000 2 udp 111 portmapper
>>>> 100024 1 udp 56082 status
>>>> 100024 1 tcp 41858 status
>>>> 100003 3 udp 2049 nfs
>>>> 100003 3 tcp 2049 nfs
>>>> 100003 4 udp 2049 nfs
>>>> 100003 4 tcp 2049 nfs
>>>> 100005 1 udp 45611 mountd
>>>> 100005 1 tcp 55915 mountd
>>>> 100005 3 udp 45611 mountd
>>>> 100005 3 tcp 55915 mountd
>>>> 100021 4 udp 48775 nlockmgr
>>>> 100021 4 tcp 51621 nlockmgr
>>>> 100011 1 udp 4501 rquotad
>>>> 100011 1 tcp 4501 rquotad
>>>> 100011 2 udp 4501 rquotad
>>>> 100011 2 tcp 4501 rquotad
>>>>
>>>>>
>>>>> What does "rpcinfo -p <server-ip>" show?
>>>>>
>>>>> Do you have selinux enabled? I am not sure if that is playing any role
>>>>> here...
>>>>
>>>> Nope, it's disabled:
>>>>
>>>> # uname -a
>>>> Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>>
>>>> Thanks for the help,
>>>>
>>>> Alessandro
>>>>
>>>>>
>>>>> Regards, Malahal.
>>>>>
>>>>> Alessandro De Salvo [Alessandro.DeSalvo at roma1.infn.it] wrote:
>>>>>> Hi,
>>>>>> this was an extract from the old logs, before Soumya's suggestion of
>>>>>> changing the rquota port in the conf file. The new logs are attached
>>>>>> (ganesha-20150611.log.gz) as well as the gstack of the ganesha process
>>>>>> while I was executing the hanging showmount
>>>>>> (ganesha-20150611.gstack.gz).
>>>>>> Thanks,
>>>>>>
>>>>>> Alessandro
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote:
>>>>>>> Soumya Koduri [skoduri at redhat.com] wrote:
>>>>>>>> CCin ganesha-devel to get more inputs.
>>>>>>>>
>>>>>>>> In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha.
>>>>>>>
>>>>>>> I am not a network expert but I have seen IPv4 traffic over IPv6
>>>>>>> interface while fixing few things before. This may be normal.
>>>>>>>
>>>>>>>> commit - git show 'd7e8f255' , which got added in v2.2 has more details.
>>>>>>>>
>>>>>>>>> # netstat -ltaupn | grep 2049
>>>>>>>>> tcp6 4 0 :::2049 :::*
>>>>>>>>> LISTEN 32080/ganesha.nfsd
>>>>>>>>> tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT
>>>>>>>>> -
>>>>>>>>> tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555
>>>>>>>>> CLOSE_WAIT -
>>>>>>>>> udp6 0 0 :::2049 :::*
>>>>>>>>> 32080/ganesha.nfsd
>>>>>>>>
>>>>>>>>>>> I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen:
>>>>>>>>>>>
>>>>>>>>>>> tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind
>>>>>>>>>>> tcp6 0 0 :::2224 :::* LISTEN 9054/ruby
>>>>>>>>>>> tcp6 0 0 :::22 :::* LISTEN 1248/sshd
>>>>>>>>>>> udp6 0 0 :::111 :::* 7433/rpcbind
>>>>>>>>>>> udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd
>>>>>>>>>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd
>>>>>>>>>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd
>>>>>>>>>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd
>>>>>>>>>>> udp6 0 0 ::1:123 :::* 31238/ntpd
>>>>>>>>>>> udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd
>>>>>>>>>>> udp6 0 0 :::123 :::* 31238/ntpd
>>>>>>>>>>> udp6 0 0 :::824 :::* 7433/rpcbind
>>>>>>>>>>>
>>>>>>>>>>> The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use)
>>>>>>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
>>>>>>>>>>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
>>>>>>>
>>>>>>> The above messages indicate that someone tried to restart ganesha. But
>>>>>>> ganesha failed to come up because RQUOTA port (default is 875) is
>>>>>>> already in use by an old ganesha instance or some other program holding
>>>>>>> it. The new instance of ganesha will die, but if you are using systemd,
>>>>>>> it will try to restart automatically. We have disabled systemd auto
>>>>>>> restart in our environment as it was causing issues for debugging.
>>>>>>>
>>>>>>> What version of ganesha is this?
>>>>>>>
>>>>>>> Regards, Malahal.
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1770 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150615/f19bf8b7/attachment.p7s>
More information about the Gluster-users
mailing list