<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hello,</div><div class=""><br class=""></div><div class="">I have a pretty straight forward configuration as below:</div><div class=""><br class=""></div><div class="">3 storage nodes running version 3.7.11 with replica of 3 and it using native gluster NFS.</div><div class="">corosync version 1.4.7 and pacemaker version 1.1.12</div><div class="">I have DNS round-robin on 3 VIPs living on the 3 storage nodes.</div><div class=""><br class=""></div><div class=""><b class=""><u class="">Here is how I configure my corosync:</u></b></div><div class=""><br class=""></div><div class="">SN1 with x.x.x.001</div><div class="">SN2 with x.x.x.002</div><div class="">SN3 with x.x.x.003</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">******************************************************************************************************************</div><div class=""><b class=""><u class="">Below is pcs config output:</u></b></div><div class=""><br class=""></div><div class=""><div class="">Cluster Name: dfs_cluster</div><div class="">Corosync Nodes:</div><div class=""> SN1 SN2 SN3 </div><div class="">Pacemaker Nodes:</div><div class=""> SN1 SN2 SN3 </div><div class=""><br class=""></div><div class="">Resources: </div><div class=""> Clone: Gluster-clone</div><div class=""> Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=false </div><div class=""> Resource: Gluster (class=ocf provider=glusterfs type=glusterd)</div><div class=""> Operations: start interval=0s timeout=20 (Gluster-start-interval-0s)</div><div class=""> stop interval=0s timeout=20 (Gluster-stop-interval-0s)</div><div class=""> monitor interval=10s (Gluster-monitor-interval-10s)</div><div class=""> Resource: SN1-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)</div><div class=""> Attributes: ip=x.x.x.001 cidr_netmask=32 </div><div class=""> Operations: start interval=0s timeout=20s (SN1-ClusterIP-start-interval-0s)</div><div class=""> stop interval=0s timeout=20s (SN1-ClusterIP-stop-interval-0s)</div><div class=""> monitor interval=10s (SN1-ClusterIP-monitor-interval-10s)</div><div class=""> Resource: SN2-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)</div><div class=""> Attributes: ip=x.x.x.002 cidr_netmask=32 </div><div class=""> Operations: start interval=0s timeout=20s (SN2-ClusterIP-start-interval-0s)</div><div class=""> stop interval=0s timeout=20s (SN2-ClusterIP-stop-interval-0s)</div><div class=""> monitor interval=10s (SN2-ClusterIP-monitor-interval-10s)</div><div class=""> Resource: SN3-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)</div><div class=""> Attributes: ip=x.x.x.003 cidr_netmask=32 </div><div class=""> Operations: start interval=0s timeout=20s (SN3-ClusterIP-start-interval-0s)</div><div class=""> stop interval=0s timeout=20s (SN3-ClusterIP-stop-interval-0s)</div><div class=""> monitor interval=10s (SN3-ClusterIP-monitor-interval-10s)</div><div class=""><br class=""></div><div class="">Stonith Devices: </div><div class="">Fencing Levels: </div><div class=""><br class=""></div><div class="">Location Constraints:</div><div class=""> Resource: SN1-ClusterIP</div><div class=""> Enabled on: SN1 (score:3000) (id:location-SN1-ClusterIP-SN1-3000)</div><div class=""> Enabled on: SN2 (score:2000) (id:location-SN1-ClusterIP-SN2-2000)</div><div class=""> Enabled on: SN3 (score:1000) (id:location-SN1-ClusterIP-SN3-1000)</div><div class=""> Resource: SN2-ClusterIP</div><div class=""> Enabled on: SN2 (score:3000) (id:location-SN2-ClusterIP-SN2-3000)</div><div class=""> Enabled on: SN3 (score:2000) (id:location-SN2-ClusterIP-SN3-2000)</div><div class=""> Enabled on: SN1 (score:1000) (id:location-SN2-ClusterIP-SN1-1000)</div><div class=""> Resource: SN3-ClusterIP</div><div class=""> Enabled on: SN3 (score:3000) (id:location-SN3-ClusterIP-SN3-3000)</div><div class=""> Enabled on: SN1 (score:2000) (id:location-SN3-ClusterIP-SN1-2000)</div><div class=""> Enabled on: SN2 (score:1000) (id:location-SN3-ClusterIP-SN2-1000)</div><div class="">Ordering Constraints:</div><div class=""> start Gluster-clone then start SN1-ClusterIP (kind:Mandatory) (id:order-Gluster-clone-SN1-ClusterIP-mandatory)</div><div class=""> start Gluster-clone then start SN2-ClusterIP (kind:Mandatory) (id:order-Gluster-clone-SN2-ClusterIP-mandatory)</div><div class=""> start Gluster-clone then start SN3-ClusterIP (kind:Mandatory) (id:order-Gluster-clone-SN3-ClusterIP-mandatory)</div><div class="">Colocation Constraints:</div><div class=""><br class=""></div><div class="">Resources Defaults:</div><div class=""> is-managed: true</div><div class=""> target-role: Started</div><div class=""> requires: nothing</div><div class=""> multiple-active: stop_nkart</div><div class="">Operations Defaults:</div><div class=""> No defaults set</div><div class=""><br class=""></div><div class="">Cluster Properties:</div><div class=""> cluster-infrastructure: cman</div><div class=""> dc-version: 1.1.11-97629de</div><div class=""> no-quorum-policy: ignore</div><div class=""> stonith-enabled: false</div></div><div class=""><br class=""></div><div class="">******************************************************************************************************************</div><div class=""><b class=""><u class="">pcs status output:</u></b></div><div class=""><br class=""></div><div class=""><div class="">Cluster name: dfs_cluster</div><div class="">Last updated: Thu Sep 22 16:57:35 2016</div><div class="">Last change: Mon Aug 29 18:02:44 2016</div><div class="">Stack: cman</div><div class="">Current DC: SN1 - partition with quorum</div><div class="">Version: 1.1.11-97629de</div><div class="">3 Nodes configured</div><div class="">6 Resources configured</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Online: [ SN1 SN2 SN3 ]</div><div class=""><br class=""></div><div class="">Full list of resources:</div><div class=""><br class=""></div><div class=""> Clone Set: Gluster-clone [Gluster]</div><div class=""> Started: [ SN1 SN2 SN3 ]</div><div class=""> SN1-ClusterIP<span class="Apple-tab-span" style="white-space: pre;">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space: pre;">        </span>Started SN1 </div><div class=""> SN2-ClusterIP<span class="Apple-tab-span" style="white-space: pre;">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space: pre;">        </span>Started SN2 </div><div class=""> SN3-ClusterIP<span class="Apple-tab-span" style="white-space: pre;">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space: pre;">        </span>Started SN3 </div></div><div class=""><br class=""></div><div class="">******************************************************************************************************************</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">When I mount the gluster volume, I'm using the VIP name. It will choose one of the storage nodes to establish NFS. </div><div class=""><br class=""></div><div class=""><b class=""><u class="">My issue is:</u></b></div><div class=""><b class=""><u class=""><br class=""></u></b></div><div class="">After mounted gluster volume for 1 - 2 hrs, all the clients are reporting not getting df output as df got hung. I did check the dmessage log from client side and getting the following error :</div><div class=""><br class=""></div><div class=""><div class="" style="margin: 0px; line-height: normal;"><i class="">Sep 20 05:46:45 xxxxx kernel: nfs: server <span class="" style="font-style: normal;">nfsserver001</span> not responding, still trying</i></div><div class="" style="margin: 0px; line-height: normal;"><i class="">Sep 20 05:49:45 xxxxx kernel: nfs: server <span class="" style="font-style: normal;">nfsserver001</span> not responding, still trying</i></div></div><div class=""><br class=""></div><div class="">I did try to mount the gluster volume using the DNS round-robin to different mountpoint but the mount process was not successful. Then I tried to mount the gluster volume using storage node IP itself (not VIP ip), and I was able to mount the gluster volume. Afterward, I flipped all the clients to mount storage node IP directly and they have been up for more than 12hrs without any issue. </div><div class=""><br class=""></div><div class="">Any idea what might cause this issue?</div><div class=""><br class=""></div><div class="">Thanks a lot,</div><br class=""><div class=""><div class="" style="font-variant-ligatures: normal; font-variant-position: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; line-height: normal; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"></div></div><div class="">~ Vic Le</div></body></html>