[Gluster-users] Questions on ganesha HA and shared storage size
Alessandro De Salvo
Alessandro.DeSalvo at roma1.infn.it
Thu Jun 11 15:48:19 UTC 2015
Soumya, do you have any other idea of what to check on my side?
Many thanks,
Alessandro
> Il giorno 10/giu/2015, alle ore 21:07, Alessandro De Salvo <alessandro.desalvo at roma1.infn.it> ha scritto:
>
> Hi,
> by looking at the connections I also see a strange problem:
>
> # netstat -ltaupn | grep 2049
> tcp6 4 0 :::2049 :::*
> LISTEN 32080/ganesha.nfsd
> tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT
> -
> tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555
> CLOSE_WAIT -
> udp6 0 0 :::2049 :::*
> 32080/ganesha.nfsd
>
>
> Why tcp6 is used with an IPv4 address?
> In another machine where ganesha 2.1.0 is running I see tcp is used, not
> tcp6.
> Could it be that the RPC are always trying to use IPv6? That would be
> wrong.
> Thanks,
>
> Alessandro
>
> On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:
>>
>> On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
>>> Hi,
>>> I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen:
>>>
>>> tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind
>>> tcp6 0 0 :::2224 :::* LISTEN 9054/ruby
>>> tcp6 0 0 :::22 :::* LISTEN 1248/sshd
>>> udp6 0 0 :::111 :::* 7433/rpcbind
>>> udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd
>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd
>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd
>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd
>>> udp6 0 0 ::1:123 :::* 31238/ntpd
>>> udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd
>>> udp6 0 0 :::123 :::* 31238/ntpd
>>> udp6 0 0 :::824 :::* 7433/rpcbind
>>>
>>> The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following:
>>>
>>>
>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use)
>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
>>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
>>>
>>
>> We have seen such issues with RPCBIND few times. NFS-Ganesha setup first
>> disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes,
>> there could be delay or issue with Gluster-NFS un-registering those
>> services and when NFS-Ganesha tries to register to the same port, it
>> throws this error. Please try registering Rquota to any random port
>> using below config option in "/etc/ganesha/ganesha.conf"
>>
>> NFS_Core_Param {
>> #Use a non-privileged port for RQuota
>> Rquota_Port = 4501;
>> }
>>
>> and cleanup '/var/cache/rpcbind/' directory before the setup.
>>
>> Thanks,
>> Soumya
>>
>>>
>>> Thanks,
>>>
>>> Alessandro
>>>
>>>
>>>
>>>
>>>> Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri <skoduri at redhat.com> ha scritto:
>>>>
>>>>
>>>>
>>>> On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
>>>>> Another update: the fact that I was unable to use vol set ganesha.enable
>>>>> was due to another bug in the ganesha scripts. In short, they are all
>>>>> using the following line to get the location of the conf file:
>>>>>
>>>>> CONF=$(cat /etc/sysconfig/ganesha | grep "CONFFILE" | cut -f 2 -d "=")
>>>>>
>>>>> First of all by default in /etc/sysconfig/ganesha there is no line
>>>>> CONFFILE, second there is a bug in that directive, as it works if I add
>>>>> in /etc/sysconfig/ganesha
>>>>>
>>>>> CONFFILE=/etc/ganesha/ganesha.conf
>>>>>
>>>>> but it fails if the same is quoted
>>>>>
>>>>> CONFFILE="/etc/ganesha/ganesha.conf"
>>>>>
>>>>> It would be much better to use the following, which has a default as
>>>>> well:
>>>>>
>>>>> eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
>>>>> CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
>>>>>
>>>>> I'll update the bug report.
>>>>> Having said this... the last issue to tackle is the real problem with
>>>>> the ganesha.nfsd :-(
>>>>
>>>> Thanks. Could you try changing log level to NIV_FULL_DEBUG in '/etc/sysconfig/ganesha' and check if anything gets logged in '/var/log/ganesha.log' or '/ganesha.log'.
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>>> Cheers,
>>>>>
>>>>> Alessandro
>>>>>
>>>>>
>>>>> On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
>>>>>> OK, I can confirm that the ganesha.nsfd process is actually not
>>>>>> answering to the calls. Here it is what I see:
>>>>>>
>>>>>> # rpcinfo -p
>>>>>> program vers proto port service
>>>>>> 100000 4 tcp 111 portmapper
>>>>>> 100000 3 tcp 111 portmapper
>>>>>> 100000 2 tcp 111 portmapper
>>>>>> 100000 4 udp 111 portmapper
>>>>>> 100000 3 udp 111 portmapper
>>>>>> 100000 2 udp 111 portmapper
>>>>>> 100024 1 udp 41594 status
>>>>>> 100024 1 tcp 53631 status
>>>>>> 100003 3 udp 2049 nfs
>>>>>> 100003 3 tcp 2049 nfs
>>>>>> 100003 4 udp 2049 nfs
>>>>>> 100003 4 tcp 2049 nfs
>>>>>> 100005 1 udp 58127 mountd
>>>>>> 100005 1 tcp 56301 mountd
>>>>>> 100005 3 udp 58127 mountd
>>>>>> 100005 3 tcp 56301 mountd
>>>>>> 100021 4 udp 46203 nlockmgr
>>>>>> 100021 4 tcp 41798 nlockmgr
>>>>>> 100011 1 udp 875 rquotad
>>>>>> 100011 1 tcp 875 rquotad
>>>>>> 100011 2 udp 875 rquotad
>>>>>> 100011 2 tcp 875 rquotad
>>>>>>
>>>>>> # netstat -lpn | grep ganesha
>>>>>> tcp6 14 0 :::2049 :::*
>>>>>> LISTEN 11937/ganesha.nfsd
>>>>>> tcp6 0 0 :::41798 :::*
>>>>>> LISTEN 11937/ganesha.nfsd
>>>>>> tcp6 0 0 :::875 :::*
>>>>>> LISTEN 11937/ganesha.nfsd
>>>>>> tcp6 10 0 :::56301 :::*
>>>>>> LISTEN 11937/ganesha.nfsd
>>>>>> tcp6 0 0 :::564 :::*
>>>>>> LISTEN 11937/ganesha.nfsd
>>>>>> udp6 0 0 :::2049 :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6 0 0 :::46203 :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6 0 0 :::58127 :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6 0 0 :::875 :::*
>>>>>> 11937/ganesha.nfsd
>>>>>>
>>>>>> I'm attaching the strace of a showmount from a node to the other.
>>>>>> This machinery was working with nfs-ganesha 2.1.0, so it must be
>>>>>> something introduced with 2.2.0.
>>>>>> Cheers,
>>>>>>
>>>>>> Alessandro
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
>>>>>>>
>>>>>>> On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
>>>>>>>> Hi,
>>>>>>>> OK, the problem with the VIPs not starting is due to the ganesha_mon
>>>>>>>> heartbeat script looking for a pid file called
>>>>>>>> /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
>>>>>>>> creating /var/run/ganesha.pid, this needs to be corrected. The file is
>>>>>>>> in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
>>>>>>>> For the moment I have created a symlink in this way and it works:
>>>>>>>>
>>>>>>>> ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
>>>>>>>>
>>>>>>> Thanks. Please update this as well in the bug.
>>>>>>>
>>>>>>>> So far so good, the VIPs are up and pingable, but still there is the
>>>>>>>> problem of the hanging showmount (i.e. hanging RPC).
>>>>>>>> Still, I see a lot of errors like this in /var/log/messages:
>>>>>>>>
>>>>>>>> Jun 9 11:15:20 atlas-node1 lrmd[31221]: notice: operation_finished:
>>>>>>>> nfs-mon_monitor_10000:29292:stderr [ Error: Resource does not exist. ]
>>>>>>>>
>>>>>>>> While ganesha.log shows the server is not in grace:
>>>>>>>>
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
>>>>>>>> Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
>>>>>>>> May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
>>>>>>>> <http://buildhw-09.phx2.fedoraproject.org>
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
>>>>>>>> :Configuration file successfully parsed
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
>>>>>>>> :Initializing ID Mapper.
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
>>>>>>>> successfully initialized.
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
>>>>>>>> found in configuration file !!!
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File
>>>>>>>> ((null):0): Empty configuration file
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
>>>>>>>> :CAP_SYS_RESOURCE was successfully removed for proper quota management
>>>>>>>> in FSAL
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set
>>>>>>>> capabilities are: =
>>>>>>>> cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire
>>>>>>>> credentials for principal nfs
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin
>>>>>>>> thread initialized
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now
>>>>>>>> IN GRACE, duration 60
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT
>>>>>>>> :Callback creds directory (/var/run/ganesha) already exists
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
>>>>>>>> :gssd_refresh_krb5_machine_credential failed (2:2)
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting
>>>>>>>> delayed executor.
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP
>>>>>>>> dispatcher thread was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
>>>>>>>> dispatcher started
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
>>>>>>>> :gsh_dbusthread was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread
>>>>>>>> was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread
>>>>>>>> was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN
>>>>>>>> GRACE
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General
>>>>>>>> fridge was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>>>>>>>> :-------------------------------------------------
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT : NFS
>>>>>>>> SERVER INITIALIZED
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>>>>>>>> :-------------------------------------------------
>>>>>>>> 09/06/2015 11:17:22 : epoch 5576aee4 : atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now
>>>>>>>> NOT IN GRACE
>>>>>>>>
>>>>>>>>
>>>>>>> Please check the status of nfs-ganesha
>>>>>>> $service nfs-ganesha status
>>>>>>>
>>>>>>> Could you try taking a packet trace (during showmount or mount) and
>>>>>>> check the server responses.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Soumya
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Alessandro
>>>>>>>>
>>>>>>>>
>>>>>>>>> Il giorno 09/giu/2015, alle ore 10:36, Alessandro De Salvo
>>>>>>>>> <alessandro.desalvo at roma1.infn.it
>>>>>>>>> <mailto:alessandro.desalvo at roma1.infn.it>> ha scritto:
>>>>>>>>>
>>>>>>>>> Hi Soumya,
>>>>>>>>>
>>>>>>>>>> Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri
>>>>>>>>>> <skoduri at redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
>>>>>>>>>>> OK, I found at least one of the bugs.
>>>>>>>>>>> The /usr/libexec/ganesha/ganesha.sh has the following lines:
>>>>>>>>>>>
>>>>>>>>>>> if [ -e /etc/os-release ]; then
>>>>>>>>>>> RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>>> fi
>>>>>>>>>>>
>>>>>>>>>>> This is OK for RHEL < 7, but does not work for >= 7. I have changed
>>>>>>>>>>> it to the following, to make it working:
>>>>>>>>>>>
>>>>>>>>>>> if [ -e /etc/os-release ]; then
>>>>>>>>>>> eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
>>>>>>>>>>> [ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
>>>>>>>>>>> RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>>> fi
>>>>>>>>>>>
>>>>>>>>>> Oh..Thanks for the fix. Could you please file a bug for the same (and
>>>>>>>>>> probably submit your fix as well). We shall have it corrected.
>>>>>>>>>
>>>>>>>>> Just did it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Apart from that, the VIP_<node> I was using were wrong, and I should
>>>>>>>>>>> have converted all the “-“ to underscores, maybe this could be
>>>>>>>>>>> mentioned in the documentation when you will have it ready.
>>>>>>>>>>> Now, the cluster starts, but the VIPs apparently not:
>>>>>>>>>>>
>>>>>>>>>> Sure. Thanks again for pointing it out. We shall make a note of it.
>>>>>>>>>>
>>>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>>>
>>>>>>>>>>> Full list of resources:
>>>>>>>>>>>
>>>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>>> Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>>>> Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>>>> atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
>>>>>>>>>>> atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>>>> atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
>>>>>>>>>>> atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>>> atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>>>> atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>>>
>>>>>>>>>>> PCSD Status:
>>>>>>>>>>> atlas-node1: Online
>>>>>>>>>>> atlas-node2: Online
>>>>>>>>>>>
>>>>>>>>>>> Daemon Status:
>>>>>>>>>>> corosync: active/disabled
>>>>>>>>>>> pacemaker: active/disabled
>>>>>>>>>>> pcsd: active/enabled
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Here corosync and pacemaker shows 'disabled' state. Can you check the
>>>>>>>>>> status of their services. They should be running prior to cluster
>>>>>>>>>> creation. We need to include that step in document as well.
>>>>>>>>>
>>>>>>>>> Ah, OK, you’re right, I have added it to my puppet modules (we install
>>>>>>>>> and configure ganesha via puppet, I’ll put the module on puppetforge
>>>>>>>>> soon, in case anyone is interested).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> But the issue that is puzzling me more is the following:
>>>>>>>>>>>
>>>>>>>>>>> # showmount -e localhost
>>>>>>>>>>> rpc mount export: RPC: Timed out
>>>>>>>>>>>
>>>>>>>>>>> And when I try to enable the ganesha exports on a volume I get this
>>>>>>>>>>> error:
>>>>>>>>>>>
>>>>>>>>>>> # gluster volume set atlas-home-01 ganesha.enable on
>>>>>>>>>>> volume set: failed: Failed to create NFS-Ganesha export config file.
>>>>>>>>>>>
>>>>>>>>>>> But I see the file created in /etc/ganesha/exports/*.conf
>>>>>>>>>>> Still, showmount hangs and times out.
>>>>>>>>>>> Any help?
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>> Hmm that's strange. Sometimes, in case if there was no proper cleanup
>>>>>>>>>> done while trying to re-create the cluster, we have seen such issues.
>>>>>>>>>>
>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1227709
>>>>>>>>>>
>>>>>>>>>> http://review.gluster.org/#/c/11093/
>>>>>>>>>>
>>>>>>>>>> Can you please unexport all the volumes, teardown the cluster using
>>>>>>>>>> 'gluster vol set <volname> ganesha.enable off’
>>>>>>>>>
>>>>>>>>> OK:
>>>>>>>>>
>>>>>>>>> # gluster vol set atlas-home-01 ganesha.enable off
>>>>>>>>> volume set: failed: ganesha.enable is already 'off'.
>>>>>>>>>
>>>>>>>>> # gluster vol set atlas-data-01 ganesha.enable off
>>>>>>>>> volume set: failed: ganesha.enable is already 'off'.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 'gluster ganesha disable' command.
>>>>>>>>>
>>>>>>>>> I’m assuming you wanted to write nfs-ganesha instead?
>>>>>>>>>
>>>>>>>>> # gluster nfs-ganesha disable
>>>>>>>>> ganesha enable : success
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> A side note (not really important): it’s strange that when I do a
>>>>>>>>> disable the message is “ganesha enable” :-)
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Verify if the following files have been deleted on all the nodes-
>>>>>>>>>> '/etc/cluster/cluster.conf’
>>>>>>>>>
>>>>>>>>> this file is not present at all, I think it’s not needed in CentOS 7
>>>>>>>>>
>>>>>>>>>> '/etc/ganesha/ganesha.conf’,
>>>>>>>>>
>>>>>>>>> it’s still there, but empty, and I guess it should be OK, right?
>>>>>>>>>
>>>>>>>>>> '/etc/ganesha/exports/*’
>>>>>>>>>
>>>>>>>>> no more files there
>>>>>>>>>
>>>>>>>>>> '/var/lib/pacemaker/cib’
>>>>>>>>>
>>>>>>>>> it’s empty
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Verify if the ganesha service is stopped on all the nodes.
>>>>>>>>>
>>>>>>>>> nope, it’s still running, I will stop it.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> start/restart the services - corosync, pcs.
>>>>>>>>>
>>>>>>>>> In the node where I issued the nfs-ganesha disable there is no more
>>>>>>>>> any /etc/corosync/corosync.conf so corosync won’t start. The other
>>>>>>>>> node instead still has the file, it’s strange.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And re-try the HA cluster creation
>>>>>>>>>> 'gluster ganesha enable’
>>>>>>>>>
>>>>>>>>> This time (repeated twice) it did not work at all:
>>>>>>>>>
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>>>> Last updated: Tue Jun 9 10:13:43 2015
>>>>>>>>> Last change: Tue Jun 9 10:13:22 2015
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: atlas-node1 (1) - partition with quorum
>>>>>>>>> Version: 1.1.12-a14efad
>>>>>>>>> 2 Nodes configured
>>>>>>>>> 6 Resources configured
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>
>>>>>>>>> Full list of resources:
>>>>>>>>>
>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>> Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>> Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>
>>>>>>>>> PCSD Status:
>>>>>>>>> atlas-node1: Online
>>>>>>>>> atlas-node2: Online
>>>>>>>>>
>>>>>>>>> Daemon Status:
>>>>>>>>> corosync: active/enabled
>>>>>>>>> pacemaker: active/enabled
>>>>>>>>> pcsd: active/enabled
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I tried then "pcs cluster destroy" on both nodes, and then again
>>>>>>>>> nfs-ganesha enable, but now I’m back to the old problem:
>>>>>>>>>
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>>>> Last updated: Tue Jun 9 10:22:27 2015
>>>>>>>>> Last change: Tue Jun 9 10:17:00 2015
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: atlas-node2 (2) - partition with quorum
>>>>>>>>> Version: 1.1.12-a14efad
>>>>>>>>> 2 Nodes configured
>>>>>>>>> 10 Resources configured
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>
>>>>>>>>> Full list of resources:
>>>>>>>>>
>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>> Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>> Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
>>>>>>>>> atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
>>>>>>>>> atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>> atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>
>>>>>>>>> PCSD Status:
>>>>>>>>> atlas-node1: Online
>>>>>>>>> atlas-node2: Online
>>>>>>>>>
>>>>>>>>> Daemon Status:
>>>>>>>>> corosync: active/enabled
>>>>>>>>> pacemaker: active/enabled
>>>>>>>>> pcsd: active/enabled
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Alessandro
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Soumya
>>>>>>>>>>
>>>>>>>>>>> Alessandro
>>>>>>>>>>>
>>>>>>>>>>>> Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo
>>>>>>>>>>>> <Alessandro.DeSalvo at roma1.infn.it
>>>>>>>>>>>> <mailto:Alessandro.DeSalvo at roma1.infn.it>> ha scritto:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> indeed, it does not work :-)
>>>>>>>>>>>> OK, this is what I did, with 2 machines, running CentOS 7.1,
>>>>>>>>>>>> Glusterfs 3.7.1 and nfs-ganesha 2.2.0:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) ensured that the machines are able to resolve their IPs (but
>>>>>>>>>>>> this was already true since they were in the DNS);
>>>>>>>>>>>> 2) disabled NetworkManager and enabled network on both machines;
>>>>>>>>>>>> 3) created a gluster shared volume 'gluster_shared_storage' and
>>>>>>>>>>>> mounted it on '/run/gluster/shared_storage' on all the cluster
>>>>>>>>>>>> nodes using glusterfs native mount (on CentOS 7.1 there is a link
>>>>>>>>>>>> by default /var/run -> ../run)
>>>>>>>>>>>> 4) created an empty /etc/ganesha/ganesha.conf;
>>>>>>>>>>>> 5) installed pacemaker pcs resource-agents corosync on all cluster
>>>>>>>>>>>> machines;
>>>>>>>>>>>> 6) set the ‘hacluster’ user the same password on all machines;
>>>>>>>>>>>> 7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the
>>>>>>>>>>>> nodes (on both nodes I issued the commands for both nodes)
>>>>>>>>>>>> 8) IPv6 is configured by default on all nodes, although the
>>>>>>>>>>>> infrastructure is not ready for IPv6
>>>>>>>>>>>> 9) enabled pcsd and started it on all nodes
>>>>>>>>>>>> 10) populated /etc/ganesha/ganesha-ha.conf with the following
>>>>>>>>>>>> contents, one per machine:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ===> atlas-node1
>>>>>>>>>>>> # Name of the HA cluster created.
>>>>>>>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>>>> # The server from which you intend to mount
>>>>>>>>>>>> # the shared volume.
>>>>>>>>>>>> HA_VOL_SERVER=“atlas-node1"
>>>>>>>>>>>> # The subset of nodes of the Gluster Trusted Pool
>>>>>>>>>>>> # that forms the ganesha HA cluster. IP/Hostname
>>>>>>>>>>>> # is specified.
>>>>>>>>>>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
>>>>>>>>>>>> # Virtual IPs of each of the nodes specified above.
>>>>>>>>>>>> VIP_atlas-node1=“x.x.x.1"
>>>>>>>>>>>> VIP_atlas-node2=“x.x.x.2"
>>>>>>>>>>>>
>>>>>>>>>>>> ===> atlas-node2
>>>>>>>>>>>> # Name of the HA cluster created.
>>>>>>>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>>>> # The server from which you intend to mount
>>>>>>>>>>>> # the shared volume.
>>>>>>>>>>>> HA_VOL_SERVER=“atlas-node2"
>>>>>>>>>>>> # The subset of nodes of the Gluster Trusted Pool
>>>>>>>>>>>> # that forms the ganesha HA cluster. IP/Hostname
>>>>>>>>>>>> # is specified.
>>>>>>>>>>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
>>>>>>>>>>>> # Virtual IPs of each of the nodes specified above.
>>>>>>>>>>>> VIP_atlas-node1=“x.x.x.1"
>>>>>>>>>>>> VIP_atlas-node2=“x.x.x.2”
>>>>>>>>>>>>
>>>>>>>>>>>> 11) issued gluster nfs-ganesha enable, but it fails with a cryptic
>>>>>>>>>>>> message:
>>>>>>>>>>>>
>>>>>>>>>>>> # gluster nfs-ganesha enable
>>>>>>>>>>>> Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the
>>>>>>>>>>>> trusted pool. Do you still want to continue? (y/n) y
>>>>>>>>>>>> nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha.
>>>>>>>>>>>> Please check the log file for details
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at the logs I found nothing really special but this:
>>>>>>>>>>>>
>>>>>>>>>>>> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
>>>>>>>>>>>> [2015-06-08 17:57:15.672844] I [MSGID: 106132]
>>>>>>>>>>>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>>>>>>>>>> already stopped
>>>>>>>>>>>> [2015-06-08 17:57:15.675395] I
>>>>>>>>>>>> [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:15.720692] I
>>>>>>>>>>>> [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:15.721161] I
>>>>>>>>>>>> [glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:16.633048] E
>>>>>>>>>>>> [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
>>>>>>>>>>>> Initial NFS-Ganesha set up failed
>>>>>>>>>>>> [2015-06-08 17:57:16.641563] E
>>>>>>>>>>>> [glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
>>>>>>>>>>>> operation 'Volume (null)' failed on localhost : Failed to set up HA
>>>>>>>>>>>> config for NFS-Ganesha. Please check the log file for details
>>>>>>>>>>>>
>>>>>>>>>>>> ==> /var/log/glusterfs/cmd_history.log <==
>>>>>>>>>>>> [2015-06-08 17:57:16.643615] : nfs-ganesha enable : FAILED :
>>>>>>>>>>>> Failed to set up HA config for NFS-Ganesha. Please check the log
>>>>>>>>>>>> file for details
>>>>>>>>>>>>
>>>>>>>>>>>> ==> /var/log/glusterfs/cli.log <==
>>>>>>>>>>>> [2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
>>>>>>>>>>>> with: -1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Also, pcs seems to be fine for the auth part, although it obviously
>>>>>>>>>>>> tells me the cluster is not running.
>>>>>>>>>>>>
>>>>>>>>>>>> I, [2015-06-08T19:57:16.305323 #7223] INFO -- : Running:
>>>>>>>>>>>> /usr/sbin/corosync-cmapctl totem.cluster_name
>>>>>>>>>>>> I, [2015-06-08T19:57:16.345457 #7223] INFO -- : Running:
>>>>>>>>>>>> /usr/sbin/pcs cluster token-nodes
>>>>>>>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
>>>>>>>>>>>> /remote/check_auth HTTP/1.1" 200 68 0.1919
>>>>>>>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
>>>>>>>>>>>> /remote/check_auth HTTP/1.1" 200 68 0.1920
>>>>>>>>>>>> atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET
>>>>>>>>>>>> /remote/check_auth HTTP/1.1" 200 68
>>>>>>>>>>>> - -> /remote/check_auth
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> What am I doing wrong?
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>
>>>>>>>>>>>>> Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri
>>>>>>>>>>>>> <skoduri at redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
>>>>>>>>>>>>>> Sorry, just another question:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - in my installation of gluster 3.7.1 the command gluster
>>>>>>>>>>>>>> features.ganesha enable does not work:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # gluster features.ganesha enable
>>>>>>>>>>>>>> unrecognized word: features.ganesha (position 0)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which version has full support for it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry. This option has recently been changed. It is now
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ gluster nfs-ganesha enable
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - in the documentation the ccs and cman packages are required,
>>>>>>>>>>>>>> but they seems not to be available anymore on CentOS 7 and
>>>>>>>>>>>>>> similar, I guess they are not really required anymore, as pcs
>>>>>>>>>>>>>> should do the full job
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like so from http://clusterlabs.org/quickstart-redhat.html.
>>>>>>>>>>>>> Let us know if it doesn't work.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo
>>>>>>>>>>>>>>> <alessandro.desalvo at roma1.infn.it
>>>>>>>>>>>>>>> <mailto:alessandro.desalvo at roma1.infn.it>> ha scritto:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Great, many thanks Soumya!
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri
>>>>>>>>>>>>>>>> <skoduri at redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please find the slides of the demo video at [1]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We recommend to have a distributed replica volume as a shared
>>>>>>>>>>>>>>>> volume for better data-availability.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Size of the volume depends on the workload you may have. Since
>>>>>>>>>>>>>>>> it is used to maintain states of NLM/NFSv4 clients, you may
>>>>>>>>>>>>>>>> calculate the size of the volume to be minimum of aggregate of
>>>>>>>>>>>>>>>> (typical_size_of'/var/lib/nfs'_directory +
>>>>>>>>>>>>>>>> ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We shall document about this feature sooner in the gluster docs
>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> I have seen the demo video on ganesha HA,
>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>>>>>>>>>>>>> However there is no advice on the appropriate size of the
>>>>>>>>>>>>>>>>> shared volume. How is it really used, and what should be a
>>>>>>>>>>>>>>>>> reasonable size for it?
>>>>>>>>>>>>>>>>> Also, are the slides from the video available somewhere, as
>>>>>>>>>>>>>>>>> well as a documentation on all this? I did not manage to find
>>>>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1770 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150611/c92cc441/attachment.p7s>
More information about the Gluster-users
mailing list