[Gluster-users] Questions on ganesha HA and shared storage size

Soumya Koduri skoduri at redhat.com
Tue Jun 9 16:37:26 UTC 2015



On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
> Another update: the fact that I was unable to use vol set ganesha.enable
> was due to another bug in the ganesha scripts. In short, they are all
> using the following line to get the location of the conf file:
>
> CONF=$(cat /etc/sysconfig/ganesha | grep "CONFFILE" | cut -f 2 -d "=")
>
> First of all by default in /etc/sysconfig/ganesha there is no line
> CONFFILE, second there is a bug in that directive, as it works if I add
> in /etc/sysconfig/ganesha
>
> CONFFILE=/etc/ganesha/ganesha.conf
>
> but it fails if the same is quoted
>
> CONFFILE="/etc/ganesha/ganesha.conf"
>
> It would be much better to use the following, which has a default as
> well:
>
> eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
> CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
>
> I'll update the bug report.
> Having said this... the last issue to tackle is the real problem with
> the ganesha.nfsd :-(

Thanks. Could you try changing log level to NIV_FULL_DEBUG in 
'/etc/sysconfig/ganesha' and check if anything gets logged in 
'/var/log/ganesha.log' or '/ganesha.log'.

Thanks,
Soumya

> Cheers,
>
> 	Alessandro
>
>
> On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
>> OK, I can confirm that the ganesha.nsfd process is actually not
>> answering to the calls. Here it is what I see:
>>
>> # rpcinfo -p
>>     program vers proto   port  service
>>      100000    4   tcp    111  portmapper
>>      100000    3   tcp    111  portmapper
>>      100000    2   tcp    111  portmapper
>>      100000    4   udp    111  portmapper
>>      100000    3   udp    111  portmapper
>>      100000    2   udp    111  portmapper
>>      100024    1   udp  41594  status
>>      100024    1   tcp  53631  status
>>      100003    3   udp   2049  nfs
>>      100003    3   tcp   2049  nfs
>>      100003    4   udp   2049  nfs
>>      100003    4   tcp   2049  nfs
>>      100005    1   udp  58127  mountd
>>      100005    1   tcp  56301  mountd
>>      100005    3   udp  58127  mountd
>>      100005    3   tcp  56301  mountd
>>      100021    4   udp  46203  nlockmgr
>>      100021    4   tcp  41798  nlockmgr
>>      100011    1   udp    875  rquotad
>>      100011    1   tcp    875  rquotad
>>      100011    2   udp    875  rquotad
>>      100011    2   tcp    875  rquotad
>>
>> # netstat -lpn | grep ganesha
>> tcp6      14      0 :::2049                 :::*
>> LISTEN      11937/ganesha.nfsd
>> tcp6       0      0 :::41798                :::*
>> LISTEN      11937/ganesha.nfsd
>> tcp6       0      0 :::875                  :::*
>> LISTEN      11937/ganesha.nfsd
>> tcp6      10      0 :::56301                :::*
>> LISTEN      11937/ganesha.nfsd
>> tcp6       0      0 :::564                  :::*
>> LISTEN      11937/ganesha.nfsd
>> udp6       0      0 :::2049                 :::*
>> 11937/ganesha.nfsd
>> udp6       0      0 :::46203                :::*
>> 11937/ganesha.nfsd
>> udp6       0      0 :::58127                :::*
>> 11937/ganesha.nfsd
>> udp6       0      0 :::875                  :::*
>> 11937/ganesha.nfsd
>>
>> I'm attaching the strace of a showmount from a node to the other.
>> This machinery was working with nfs-ganesha 2.1.0, so it must be
>> something introduced with 2.2.0.
>> Cheers,
>>
>> 	Alessandro
>>
>>
>>
>> On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
>>>
>>> On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
>>>> Hi,
>>>> OK, the problem with the VIPs not starting is due to the ganesha_mon
>>>> heartbeat script looking for a pid file called
>>>> /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
>>>> creating /var/run/ganesha.pid, this needs to be corrected. The file is
>>>> in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
>>>> For the moment I have created a symlink in this way and it works:
>>>>
>>>> ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
>>>>
>>> Thanks. Please update this as well in the bug.
>>>
>>>> So far so good, the VIPs are up and pingable, but still there is the
>>>> problem of the hanging showmount (i.e. hanging RPC).
>>>> Still, I see a lot of errors like this in /var/log/messages:
>>>>
>>>> Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
>>>> nfs-mon_monitor_10000:29292:stderr [ Error: Resource does not exist. ]
>>>>
>>>> While ganesha.log shows the server is not in grace:
>>>>
>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
>>>> Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
>>>> May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
>>>> <http://buildhw-09.phx2.fedoraproject.org>
>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
>>>> :Configuration file successfully parsed
>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
>>>> :Initializing ID Mapper.
>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
>>>> successfully initialized.
>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
>>>> found in configuration file !!!
>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File
>>>> ((null):0): Empty configuration file
>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
>>>> :CAP_SYS_RESOURCE was successfully removed for proper quota management
>>>> in FSAL
>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set
>>>> capabilities are: =
>>>> cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire
>>>> credentials for principal nfs
>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin
>>>> thread initialized
>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now
>>>> IN GRACE, duration 60
>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT
>>>> :Callback creds directory (/var/run/ganesha) already exists
>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
>>>> :gssd_refresh_krb5_machine_credential failed (2:2)
>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting
>>>> delayed executor.
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP
>>>> dispatcher thread was started successfully
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
>>>> dispatcher started
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
>>>> :gsh_dbusthread was started successfully
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread
>>>> was started successfully
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread
>>>> was started successfully
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN
>>>> GRACE
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General
>>>> fridge was started successfully
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>>>> :-------------------------------------------------
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :             NFS
>>>> SERVER INITIALIZED
>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>>>> :-------------------------------------------------
>>>> 09/06/2015 11:17:22 : epoch 5576aee4 : atlas-node1 :
>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now
>>>> NOT IN GRACE
>>>>
>>>>
>>> Please check the status of nfs-ganesha
>>> $service nfs-ganesha status
>>>
>>> Could you try taking a packet trace (during showmount or mount) and
>>> check the server responses.
>>>
>>> Thanks,
>>> Soumya
>>>
>>>> Cheers,
>>>>
>>>> Alessandro
>>>>
>>>>
>>>>> Il giorno 09/giu/2015, alle ore 10:36, Alessandro De Salvo
>>>>> <alessandro.desalvo at roma1.infn.it
>>>>> <mailto:alessandro.desalvo at roma1.infn.it>> ha scritto:
>>>>>
>>>>> Hi Soumya,
>>>>>
>>>>>> Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri
>>>>>> <skoduri at redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
>>>>>>> OK, I found at least one of the bugs.
>>>>>>> The /usr/libexec/ganesha/ganesha.sh has the following lines:
>>>>>>>
>>>>>>>     if [ -e /etc/os-release ]; then
>>>>>>>         RHEL6_PCS_CNAME_OPTION=""
>>>>>>>     fi
>>>>>>>
>>>>>>> This is OK for RHEL < 7, but does not work for >= 7. I have changed
>>>>>>> it to the following, to make it working:
>>>>>>>
>>>>>>>     if [ -e /etc/os-release ]; then
>>>>>>>         eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
>>>>>>>         [ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
>>>>>>> RHEL6_PCS_CNAME_OPTION=""
>>>>>>>     fi
>>>>>>>
>>>>>> Oh..Thanks for the fix. Could you please file a bug for the same (and
>>>>>> probably submit your fix as well). We shall have it corrected.
>>>>>
>>>>> Just did it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
>>>>>
>>>>>>
>>>>>>> Apart from that, the VIP_<node> I was using were wrong, and I should
>>>>>>> have converted all the “-“ to underscores, maybe this could be
>>>>>>> mentioned in the documentation when you will have it ready.
>>>>>>> Now, the cluster starts, but the VIPs apparently not:
>>>>>>>
>>>>>> Sure. Thanks again for pointing it out. We shall make a note of it.
>>>>>>
>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>
>>>>>>> Full list of resources:
>>>>>>>
>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>>>> atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
>>>>>>> atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>> atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
>>>>>>> atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>> atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>> atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>
>>>>>>> PCSD Status:
>>>>>>>   atlas-node1: Online
>>>>>>>   atlas-node2: Online
>>>>>>>
>>>>>>> Daemon Status:
>>>>>>>   corosync: active/disabled
>>>>>>>   pacemaker: active/disabled
>>>>>>>   pcsd: active/enabled
>>>>>>>
>>>>>>>
>>>>>> Here corosync and pacemaker shows 'disabled' state. Can you check the
>>>>>> status of their services. They should be running prior to cluster
>>>>>> creation. We need to include that step in document as well.
>>>>>
>>>>> Ah, OK, you’re right, I have added it to my puppet modules (we install
>>>>> and configure ganesha via puppet, I’ll put the module on puppetforge
>>>>> soon, in case anyone is interested).
>>>>>
>>>>>>
>>>>>>> But the issue that is puzzling me more is the following:
>>>>>>>
>>>>>>> # showmount -e localhost
>>>>>>> rpc mount export: RPC: Timed out
>>>>>>>
>>>>>>> And when I try to enable the ganesha exports on a volume I get this
>>>>>>> error:
>>>>>>>
>>>>>>> # gluster volume set atlas-home-01 ganesha.enable on
>>>>>>> volume set: failed: Failed to create NFS-Ganesha export config file.
>>>>>>>
>>>>>>> But I see the file created in /etc/ganesha/exports/*.conf
>>>>>>> Still, showmount hangs and times out.
>>>>>>> Any help?
>>>>>>> Thanks,
>>>>>>>
>>>>>> Hmm that's strange. Sometimes, in case if there was no proper cleanup
>>>>>> done while trying to re-create the cluster, we have seen such issues.
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1227709
>>>>>>
>>>>>> http://review.gluster.org/#/c/11093/
>>>>>>
>>>>>> Can you please unexport all the volumes, teardown the cluster using
>>>>>> 'gluster vol set <volname> ganesha.enable off’
>>>>>
>>>>> OK:
>>>>>
>>>>> # gluster vol set atlas-home-01 ganesha.enable off
>>>>> volume set: failed: ganesha.enable is already 'off'.
>>>>>
>>>>> # gluster vol set atlas-data-01 ganesha.enable off
>>>>> volume set: failed: ganesha.enable is already 'off'.
>>>>>
>>>>>
>>>>>> 'gluster ganesha disable' command.
>>>>>
>>>>> I’m assuming you wanted to write nfs-ganesha instead?
>>>>>
>>>>> # gluster nfs-ganesha disable
>>>>> ganesha enable : success
>>>>>
>>>>>
>>>>> A side note (not really important): it’s strange that when I do a
>>>>> disable the message is “ganesha enable” :-)
>>>>>
>>>>>>
>>>>>> Verify if the following files have been deleted on all the nodes-
>>>>>> '/etc/cluster/cluster.conf’
>>>>>
>>>>> this file is not present at all, I think it’s not needed in CentOS 7
>>>>>
>>>>>> '/etc/ganesha/ganesha.conf’,
>>>>>
>>>>> it’s still there, but empty, and I guess it should be OK, right?
>>>>>
>>>>>> '/etc/ganesha/exports/*’
>>>>>
>>>>> no more files there
>>>>>
>>>>>> '/var/lib/pacemaker/cib’
>>>>>
>>>>> it’s empty
>>>>>
>>>>>>
>>>>>> Verify if the ganesha service is stopped on all the nodes.
>>>>>
>>>>> nope, it’s still running, I will stop it.
>>>>>
>>>>>>
>>>>>> start/restart the services - corosync, pcs.
>>>>>
>>>>> In the node where I issued the nfs-ganesha disable there is no more
>>>>> any /etc/corosync/corosync.conf so corosync won’t start. The other
>>>>> node instead still has the file, it’s strange.
>>>>>
>>>>>>
>>>>>> And re-try the HA cluster creation
>>>>>> 'gluster ganesha enable’
>>>>>
>>>>> This time (repeated twice) it did not work at all:
>>>>>
>>>>> # pcs status
>>>>> Cluster name: ATLAS_GANESHA_01
>>>>> Last updated: Tue Jun  9 10:13:43 2015
>>>>> Last change: Tue Jun  9 10:13:22 2015
>>>>> Stack: corosync
>>>>> Current DC: atlas-node1 (1) - partition with quorum
>>>>> Version: 1.1.12-a14efad
>>>>> 2 Nodes configured
>>>>> 6 Resources configured
>>>>>
>>>>>
>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>
>>>>> Full list of resources:
>>>>>
>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>> atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
>>>>> atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2
>>>>>
>>>>> PCSD Status:
>>>>>   atlas-node1: Online
>>>>>   atlas-node2: Online
>>>>>
>>>>> Daemon Status:
>>>>>   corosync: active/enabled
>>>>>   pacemaker: active/enabled
>>>>>   pcsd: active/enabled
>>>>>
>>>>>
>>>>>
>>>>> I tried then "pcs cluster destroy" on both nodes, and then again
>>>>> nfs-ganesha enable, but now I’m back to the old problem:
>>>>>
>>>>> # pcs status
>>>>> Cluster name: ATLAS_GANESHA_01
>>>>> Last updated: Tue Jun  9 10:22:27 2015
>>>>> Last change: Tue Jun  9 10:17:00 2015
>>>>> Stack: corosync
>>>>> Current DC: atlas-node2 (2) - partition with quorum
>>>>> Version: 1.1.12-a14efad
>>>>> 2 Nodes configured
>>>>> 10 Resources configured
>>>>>
>>>>>
>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>
>>>>> Full list of resources:
>>>>>
>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>> atlas-node1-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
>>>>> atlas-node1-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node1
>>>>> atlas-node2-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
>>>>> atlas-node2-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node2
>>>>> atlas-node1-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
>>>>> atlas-node2-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
>>>>>
>>>>> PCSD Status:
>>>>>   atlas-node1: Online
>>>>>   atlas-node2: Online
>>>>>
>>>>> Daemon Status:
>>>>>   corosync: active/enabled
>>>>>   pacemaker: active/enabled
>>>>>   pcsd: active/enabled
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Alessandro
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Soumya
>>>>>>
>>>>>>> Alessandro
>>>>>>>
>>>>>>>> Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo
>>>>>>>> <Alessandro.DeSalvo at roma1.infn.it
>>>>>>>> <mailto:Alessandro.DeSalvo at roma1.infn.it>> ha scritto:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> indeed, it does not work :-)
>>>>>>>> OK, this is what I did, with 2 machines, running CentOS 7.1,
>>>>>>>> Glusterfs 3.7.1 and nfs-ganesha 2.2.0:
>>>>>>>>
>>>>>>>> 1) ensured that the machines are able to resolve their IPs (but
>>>>>>>> this was already true since they were in the DNS);
>>>>>>>> 2) disabled NetworkManager and enabled network on both machines;
>>>>>>>> 3) created a gluster shared volume 'gluster_shared_storage' and
>>>>>>>> mounted it on '/run/gluster/shared_storage' on all the cluster
>>>>>>>> nodes using glusterfs native mount (on CentOS 7.1 there is a link
>>>>>>>> by default /var/run -> ../run)
>>>>>>>> 4) created an empty /etc/ganesha/ganesha.conf;
>>>>>>>> 5) installed pacemaker pcs resource-agents corosync on all cluster
>>>>>>>> machines;
>>>>>>>> 6) set the ‘hacluster’ user the same password on all machines;
>>>>>>>> 7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the
>>>>>>>> nodes (on both nodes I issued the commands for both nodes)
>>>>>>>> 8) IPv6 is configured by default on all nodes, although the
>>>>>>>> infrastructure is not ready for IPv6
>>>>>>>> 9) enabled pcsd and started it on all nodes
>>>>>>>> 10) populated /etc/ganesha/ganesha-ha.conf with the following
>>>>>>>> contents, one per machine:
>>>>>>>>
>>>>>>>>
>>>>>>>> ===> atlas-node1
>>>>>>>> # Name of the HA cluster created.
>>>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>>>> # The server from which you intend to mount
>>>>>>>> # the shared volume.
>>>>>>>> HA_VOL_SERVER=“atlas-node1"
>>>>>>>> # The subset of nodes of the Gluster Trusted Pool
>>>>>>>> # that forms the ganesha HA cluster. IP/Hostname
>>>>>>>> # is specified.
>>>>>>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
>>>>>>>> # Virtual IPs of each of the nodes specified above.
>>>>>>>> VIP_atlas-node1=“x.x.x.1"
>>>>>>>> VIP_atlas-node2=“x.x.x.2"
>>>>>>>>
>>>>>>>> ===> atlas-node2
>>>>>>>> # Name of the HA cluster created.
>>>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>>>> # The server from which you intend to mount
>>>>>>>> # the shared volume.
>>>>>>>> HA_VOL_SERVER=“atlas-node2"
>>>>>>>> # The subset of nodes of the Gluster Trusted Pool
>>>>>>>> # that forms the ganesha HA cluster. IP/Hostname
>>>>>>>> # is specified.
>>>>>>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
>>>>>>>> # Virtual IPs of each of the nodes specified above.
>>>>>>>> VIP_atlas-node1=“x.x.x.1"
>>>>>>>> VIP_atlas-node2=“x.x.x.2”
>>>>>>>>
>>>>>>>> 11) issued gluster nfs-ganesha enable, but it fails with a cryptic
>>>>>>>> message:
>>>>>>>>
>>>>>>>> # gluster nfs-ganesha enable
>>>>>>>> Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the
>>>>>>>> trusted pool. Do you still want to continue? (y/n) y
>>>>>>>> nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha.
>>>>>>>> Please check the log file for details
>>>>>>>>
>>>>>>>> Looking at the logs I found nothing really special but this:
>>>>>>>>
>>>>>>>> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
>>>>>>>> [2015-06-08 17:57:15.672844] I [MSGID: 106132]
>>>>>>>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>>>>>> already stopped
>>>>>>>> [2015-06-08 17:57:15.675395] I
>>>>>>>> [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>> found Hostname is atlas-node2
>>>>>>>> [2015-06-08 17:57:15.720692] I
>>>>>>>> [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>> found Hostname is atlas-node2
>>>>>>>> [2015-06-08 17:57:15.721161] I
>>>>>>>> [glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
>>>>>>>> found Hostname is atlas-node2
>>>>>>>> [2015-06-08 17:57:16.633048] E
>>>>>>>> [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
>>>>>>>> Initial NFS-Ganesha set up failed
>>>>>>>> [2015-06-08 17:57:16.641563] E
>>>>>>>> [glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
>>>>>>>> operation 'Volume (null)' failed on localhost : Failed to set up HA
>>>>>>>> config for NFS-Ganesha. Please check the log file for details
>>>>>>>>
>>>>>>>> ==> /var/log/glusterfs/cmd_history.log <==
>>>>>>>> [2015-06-08 17:57:16.643615]  : nfs-ganesha enable : FAILED :
>>>>>>>> Failed to set up HA config for NFS-Ganesha. Please check the log
>>>>>>>> file for details
>>>>>>>>
>>>>>>>> ==> /var/log/glusterfs/cli.log <==
>>>>>>>> [2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
>>>>>>>> with: -1
>>>>>>>>
>>>>>>>>
>>>>>>>> Also, pcs seems to be fine for the auth part, although it obviously
>>>>>>>> tells me the cluster is not running.
>>>>>>>>
>>>>>>>> I, [2015-06-08T19:57:16.305323 #7223]  INFO -- : Running:
>>>>>>>> /usr/sbin/corosync-cmapctl totem.cluster_name
>>>>>>>> I, [2015-06-08T19:57:16.345457 #7223]  INFO -- : Running:
>>>>>>>> /usr/sbin/pcs cluster token-nodes
>>>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
>>>>>>>> /remote/check_auth HTTP/1.1" 200 68 0.1919
>>>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
>>>>>>>> /remote/check_auth HTTP/1.1" 200 68 0.1920
>>>>>>>> atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET
>>>>>>>> /remote/check_auth HTTP/1.1" 200 68
>>>>>>>> - -> /remote/check_auth
>>>>>>>>
>>>>>>>>
>>>>>>>> What am I doing wrong?
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Alessandro
>>>>>>>>
>>>>>>>>> Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri
>>>>>>>>> <skoduri at redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
>>>>>>>>>> Sorry, just another question:
>>>>>>>>>>
>>>>>>>>>> - in my installation of gluster 3.7.1 the command gluster
>>>>>>>>>> features.ganesha enable does not work:
>>>>>>>>>>
>>>>>>>>>> # gluster features.ganesha enable
>>>>>>>>>> unrecognized word: features.ganesha (position 0)
>>>>>>>>>>
>>>>>>>>>> Which version has full support for it?
>>>>>>>>>
>>>>>>>>> Sorry. This option has recently been changed. It is now
>>>>>>>>>
>>>>>>>>> $ gluster nfs-ganesha enable
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> - in the documentation the ccs and cman packages are required,
>>>>>>>>>> but they seems not to be available anymore on CentOS 7 and
>>>>>>>>>> similar, I guess they are not really required anymore, as pcs
>>>>>>>>>> should do the full job
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Alessandro
>>>>>>>>>
>>>>>>>>> Looks like so from http://clusterlabs.org/quickstart-redhat.html.
>>>>>>>>> Let us know if it doesn't work.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Soumya
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo
>>>>>>>>>>> <alessandro.desalvo at roma1.infn.it
>>>>>>>>>>> <mailto:alessandro.desalvo at roma1.infn.it>> ha scritto:
>>>>>>>>>>>
>>>>>>>>>>> Great, many thanks Soumya!
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Alessandro
>>>>>>>>>>>
>>>>>>>>>>>> Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri
>>>>>>>>>>>> <skoduri at redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Please find the slides of the demo video at [1]
>>>>>>>>>>>>
>>>>>>>>>>>> We recommend to have a distributed replica volume as a shared
>>>>>>>>>>>> volume for better data-availability.
>>>>>>>>>>>>
>>>>>>>>>>>> Size of the volume depends on the workload you may have. Since
>>>>>>>>>>>> it is used to maintain states of NLM/NFSv4 clients, you may
>>>>>>>>>>>> calculate the size of the volume to be minimum of aggregate of
>>>>>>>>>>>> (typical_size_of'/var/lib/nfs'_directory +
>>>>>>>>>>>> ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>>>>>>>>>
>>>>>>>>>>>> We shall document about this feature sooner in the gluster docs
>>>>>>>>>>>> as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Soumya
>>>>>>>>>>>>
>>>>>>>>>>>> [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>>>>>>>>>
>>>>>>>>>>>> On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> I have seen the demo video on ganesha HA,
>>>>>>>>>>>>> https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>>>>>>>>> However there is no advice on the appropriate size of the
>>>>>>>>>>>>> shared volume. How is it really used, and what should be a
>>>>>>>>>>>>> reasonable size for it?
>>>>>>>>>>>>> Also, are the slides from the video available somewhere, as
>>>>>>>>>>>>> well as a documentation on all this? I did not manage to find
>>>>>>>>>>>>> them.
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>


More information about the Gluster-users mailing list