[Gluster-users] pacemaker VIP routing latency to gluster node.

Soumya Koduri skoduri at redhat.com
Sun Sep 25 13:44:48 UTC 2016



On 09/23/2016 10:44 PM, Dung Le wrote:
> Hi Soumya,
>
>> Did you check 'pcs status' output that time? Maybe the *-ClusterIP*
>> resources would have gone to Stopped state, making VIPs unavailable.
>
> Yes, I did check the ‘pcs status’ and everything was good at the time.
>
> I just hit the issue again with VIP mounting and df output yesterday.
>
> On the client 1, DF output was hung . I also could NOT mount the gluster
> volume via VIP x.x.x.001, but I could mount the gluster volume via VIP
> x.x.x.002 & x.x.x.003.
> On the client 2, I could mount the gluster volume via VIP  x.x.x.001 &
>  x.x.x.002 &  x.x.x.003.

So that means, only the traffic/the connection between client1 and VIP1 
is affected. One possibility which I can think of is that probably the 
outstanding requests from that client reached throttle limit (16) and 
server stopped processing further requests. Could you take tcpdump from 
the client and server and observe the traffic. Also please check netstat 
output on the server

# netstat -ntau | grep VIP1

Thanks,
Soumya

>
> Since I did configure pacemaker VIP ip x.x.x.001 for SN1, so I went
> ahead to stop pcs service on SN1 ‘pcs cluster stop’. The VIP ip
> x.x.x.001 failover to SN2 as my configuration, afterward I could mount
> the gluster volume via VIP’s IP x.x.x.001 on the client 1.
>
> Any idea ??
>
> Thanks,
> ~ Vic Le
>
>> On Sep 23, 2016, at 1:33 AM, Soumya Koduri <skoduri at redhat.com
>> <mailto:skoduri at redhat.com>> wrote:
>>
>>
>>
>> On 09/23/2016 02:34 AM, Dung Le wrote:
>>> Hello,
>>>
>>> I have a pretty straight forward configuration as below:
>>>
>>> 3 storage nodes running version 3.7.11 with replica of 3 and it using
>>> native gluster NFS.
>>> corosync version 1.4.7 and pacemaker version 1.1.12
>>> I have DNS round-robin on 3 VIPs living on the 3 storage nodes.
>>>
>>> *_Here is how I configure my corosync:_*
>>>
>>> SN1 with x.x.x.001
>>> SN2 with x.x.x.002
>>> SN3 with x.x.x.003
>>>
>>>
>>> ******************************************************************************************************************
>>> *_Below is pcs config output:_*
>>>
>>> Cluster Name: dfs_cluster
>>> Corosync Nodes:
>>> SN1 SN2 SN3
>>> Pacemaker Nodes:
>>> SN1 SN2 SN3
>>>
>>> Resources:
>>> Clone: Gluster-clone
>>>  Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=false
>>>  Resource: Gluster (class=ocf provider=glusterfs type=glusterd)
>>>   Operations: start interval=0s timeout=20 (Gluster-start-interval-0s)
>>>               stop interval=0s timeout=20 (Gluster-stop-interval-0s)
>>>               monitor interval=10s (Gluster-monitor-interval-10s)
>>> Resource: SN1-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>>  Attributes: ip=x.x.x.001 cidr_netmask=32
>>>  Operations: start interval=0s timeout=20s
>>> (SN1-ClusterIP-start-interval-0s)
>>>              stop interval=0s timeout=20s
>>> (SN1-ClusterIP-stop-interval-0s)
>>>              monitor interval=10s (SN1-ClusterIP-monitor-interval-10s)
>>> Resource: SN2-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>>  Attributes: ip=x.x.x.002 cidr_netmask=32
>>>  Operations: start interval=0s timeout=20s
>>> (SN2-ClusterIP-start-interval-0s)
>>>              stop interval=0s timeout=20s
>>> (SN2-ClusterIP-stop-interval-0s)
>>>              monitor interval=10s (SN2-ClusterIP-monitor-interval-10s)
>>> Resource: SN3-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>>  Attributes: ip=x.x.x.003 cidr_netmask=32
>>>  Operations: start interval=0s timeout=20s
>>> (SN3-ClusterIP-start-interval-0s)
>>>              stop interval=0s timeout=20s
>>> (SN3-ClusterIP-stop-interval-0s)
>>>              monitor interval=10s (SN3-ClusterIP-monitor-interval-10s)
>>>
>>> Stonith Devices:
>>> Fencing Levels:
>>>
>>> Location Constraints:
>>>  Resource: SN1-ClusterIP
>>>    Enabled on: SN1 (score:3000) (id:location-SN1-ClusterIP-SN1-3000)
>>>    Enabled on: SN2 (score:2000) (id:location-SN1-ClusterIP-SN2-2000)
>>>    Enabled on: SN3 (score:1000) (id:location-SN1-ClusterIP-SN3-1000)
>>>  Resource: SN2-ClusterIP
>>>    Enabled on: SN2 (score:3000) (id:location-SN2-ClusterIP-SN2-3000)
>>>    Enabled on: SN3 (score:2000) (id:location-SN2-ClusterIP-SN3-2000)
>>>    Enabled on: SN1 (score:1000) (id:location-SN2-ClusterIP-SN1-1000)
>>>  Resource: SN3-ClusterIP
>>>    Enabled on: SN3 (score:3000) (id:location-SN3-ClusterIP-SN3-3000)
>>>    Enabled on: SN1 (score:2000) (id:location-SN3-ClusterIP-SN1-2000)
>>>    Enabled on: SN2 (score:1000) (id:location-SN3-ClusterIP-SN2-1000)
>>> Ordering Constraints:
>>>  start Gluster-clone then start SN1-ClusterIP (kind:Mandatory)
>>> (id:order-Gluster-clone-SN1-ClusterIP-mandatory)
>>>  start Gluster-clone then start SN2-ClusterIP (kind:Mandatory)
>>> (id:order-Gluster-clone-SN2-ClusterIP-mandatory)
>>>  start Gluster-clone then start SN3-ClusterIP (kind:Mandatory)
>>> (id:order-Gluster-clone-SN3-ClusterIP-mandatory)
>>> Colocation Constraints:
>>>
>>> Resources Defaults:
>>> is-managed: true
>>> target-role: Started
>>> requires: nothing
>>> multiple-active: stop_nkart
>>> Operations Defaults:
>>> No defaults set
>>>
>>> Cluster Properties:
>>> cluster-infrastructure: cman
>>> dc-version: 1.1.11-97629de
>>> no-quorum-policy: ignore
>>> stonith-enabled: false
>>>
>>> ******************************************************************************************************************
>>> *_pcs status output:_*
>>>
>>> Cluster name: dfs_cluster
>>> Last updated: Thu Sep 22 16:57:35 2016
>>> Last change: Mon Aug 29 18:02:44 2016
>>> Stack: cman
>>> Current DC: SN1 - partition with quorum
>>> Version: 1.1.11-97629de
>>> 3 Nodes configured
>>> 6 Resources configured
>>>
>>>
>>> Online: [ SN1 SN2 SN3 ]
>>>
>>> Full list of resources:
>>>
>>> Clone Set: Gluster-clone [Gluster]
>>>     Started: [ SN1 SN2 SN3 ]
>>> SN1-ClusterIP(ocf::heartbeat:IPaddr2):Started SN1
>>> SN2-ClusterIP(ocf::heartbeat:IPaddr2):Started SN2
>>> SN3-ClusterIP(ocf::heartbeat:IPaddr2):Started SN3
>>>
>>> ******************************************************************************************************************
>>>
>>>
>>> When I mount the gluster volume, I'm using the VIP name. It will choose
>>> one of the storage nodes to establish NFS.
>>>
>>> *_My issue is:_*
>>> *_
>>> _*
>>> After mounted gluster volume for 1 - 2 hrs, all the clients are
>>> reporting not getting df output as df got hung. I did check the dmessage
>>> log from client side and getting the following error :
>>>
>>> /Sep 20 05:46:45 xxxxx kernel: nfs: server nfsserver001 not responding,
>>> still trying/
>>> /Sep 20 05:49:45 xxxxx kernel: nfs: server nfsserver001 not responding,
>>> still trying/
>>>
>>> I did try to mount the gluster volume using the DNS round-robin to
>>> different mountpoint but the mount process was not successful.
>>
>> Did you check 'pcs status' output that time? Maybe the *-ClusterIP*
>> resources would have gone to Stopped state, making VIPs unavailable.
>>
>> Thanks,
>> Soumya
>>
>> Then I
>>> tried to mount the gluster volume using storage node IP itself (not VIP
>>> ip), and I was able to mount the gluster volume. Afterward, I flipped
>>> all the clients to mount storage node IP directly and they have been up
>>> for more than 12hrs without any issue.
>>>
>>> Any idea what might cause this issue?
>>>
>>> Thanks a lot,
>>>
>>> ~ Vic Le
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>


More information about the Gluster-users mailing list