<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hi guys</p>

    <p>I'm wondering if anyone here is using the GlusterFS OCF resource

      agents with Pacemaker on CentOS 7?</p>

    <p><tt>yum install centos-release-gluster</tt><tt><br>

      </tt><tt>yum install glusterfs-server glusterfs-resource-agents</tt></p>

    <p>The reason I ask is that there seem to be a few problems with

      them on 3.10, but these problems are so severe that I'm struggling

      to believe I'm not just doing something wrong.</p>

    <p>I created my brick (on a volume previously used for DRBD, thus

      its name):</p>

    <p><tt>mkfs.xfs /dev/cl/lv_drbd -f</tt><tt><br>

      </tt><tt>mkdir -p /gluster/test_brick</tt><tt><br>

      </tt><tt>mount -t xfs /dev/cl/lv_drbd /gluster</tt><br>

    </p>

    <p>And then my volume (enabling clients to mount it via NFS):</p>

    <p><tt>systemctl start glusterd</tt><tt><br>

      </tt><tt>gluster volume create logs replica 2 transport tcp </tt><tt>pcmk01-drbd:/gluster/test_brick

        pcmk02-drbd:/gluster/test_brick</tt><tt><br>

      </tt><tt>gluster volume start test_logs</tt><tt><br>

      </tt><tt>gluster volume set test_logs nfs.disable off</tt><br>

    </p>

    <p>And here's where the fun starts.</p>

    <p>Firstly, we need to work around bug 1233344* (which was closed

      when 3.7 went end-of-life but still seems valid in 3.10):</p>

    <p><tt>sed -i

's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#'

        /usr/lib/ocf/resource.d/glusterfs/volume</tt><br>

    </p>

    <p>With that done, I [attempt to] stop GlusterFS so it can be

      brought under Pacemaker control:<br>

    </p>

    <p><tt>systemctl stop glusterfsd</tt><tt><br>

      </tt><tt>systemctl stop glusterd</tt><tt><br>

      </tt><tt>umount /gluster</tt></p>

    <p>(I usually have to manually kill glusterfs processes at this

      point before the unmount works - why does the systemctl stop not

      do it?)</p>

    <p>With the node in standby (just one is online in this example, but

      another is configured), I then set up the resources:</p>

    <p><tt>pcs node standby</tt><tt><br>

      </tt><tt>pcs resource create gluster_data ocf:heartbeat:Filesystem

        device="/dev/cl/lv_drbd" directory="/gluster" fstype="xfs"</tt><tt><br>

      </tt><tt>pcs resource create glusterd ocf:glusterfs:glusterd</tt><tt><br>

      </tt><tt>pcs resource create gluster_vol ocf:glusterfs:volume

        volname="test_logs"</tt><tt><br>

      </tt><tt>pcs resource create test_logs ocf:heartbeat:Filesystem \</tt><tt><br>

      </tt><tt>    device="localhost:/test_logs"

        directory="/var/log/test" fstype="nfs" \</tt><tt><br>

      </tt><tt>   

options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0"

        \</tt><tt><br>

      </tt><tt>    op monitor OCF_CHECK_LEVEL="20"</tt><tt><br>

      </tt><tt>pcs resource clone glusterd</tt><tt><br>

      </tt><tt>pcs resource clone gluster_data</tt><tt><br>

      </tt><tt>pcs resource clone gluster_vol ordered=true</tt><tt><br>

      </tt><tt>pcs constraint order start gluster_data-clone then start

        glusterd-clone</tt><tt><br>

      </tt><tt>pcs constraint order start glusterd-clone then start

        gluster_vol-clone</tt><tt><br>

      </tt><tt>pcs constraint order start gluster_vol-clone then start

        test_logs</tt><tt><br>

      </tt><tt>pcs constraint colocation add test_logs with FloatingIp

        INFINITY</tt><br>

    </p>

    <p>(note the SELinux wrangling - this is because I have a CGI web

      application which will later need to read files from the <tt>/var/log/test</tt>

      mount)</p>

    <p>At this point, even with the node in standby, it's <i>already</i>

      failing:</p>

    <p><tt>[root@pcmk01 ~]# pcs status</tt><tt><br>

      </tt><tt>Cluster name: test_cluster</tt><tt><br>

      </tt><tt>Stack: corosync</tt><tt><br>

      </tt><tt>Current DC: pcmk01-cr (version 1.1.15-11.el7_3.5-e174ec8)

        - partition WITHOUT quorum</tt><tt><br>

      </tt><tt>Last updated: Thu Dec  7 13:20:41 2017          Last

        change: Thu Dec  7 13:09:33 2017 by root via crm_attribute on

        pcmk01-cr</tt><tt><br>

      </tt><tt><br>

      </tt><tt>2 nodes and 13 resources configured</tt><tt><br>

      </tt><tt><br>

      </tt><tt>Online: [ pcmk01-cr ]</tt><tt><br>

      </tt><tt>OFFLINE: [ pcmk02-cr ]</tt><tt><br>

      </tt><tt><br>

      </tt><tt>Full list of resources:</tt><tt><br>

      </tt><tt><br>

      </tt><tt> FloatingIp     (ocf::heartbeat:IPaddr2):       Started

        pcmk01-cr</tt><tt><br>

      </tt><tt> test_logs      (ocf::heartbeat:Filesystem):    Stopped</tt><tt><br>

      </tt><tt> Clone Set: glusterd-clone [glusterd]</tt><tt><br>

      </tt><tt>     Stopped: [ pcmk01-cr pcmk02-cr ]</tt><tt><br>

      </tt><tt> Clone Set: gluster_data-clone [gluster_data]</tt><tt><br>

      </tt><tt>     Stopped: [ pcmk01-cr pcmk02-cr ]</tt><tt><br>

      </tt><tt> Clone Set: gluster_vol-clone [gluster_vol]</tt><tt><br>

      </tt><tt>     gluster_vol        (ocf::glusterfs:volume):       

        FAILED pcmk01-cr (blocked)</tt><tt><br>

      </tt><tt>     Stopped: [ pcmk02-cr ]</tt><tt><br>

      </tt><tt><br>

      </tt><tt>Failed Actions:</tt><tt><br>

      </tt><tt>* gluster_data_start_0 on pcmk01-cr 'not configured' (6):

        call=72, status=complete, exitreason='DANGER! xfs on

        /dev/cl/lv_drbd is NOT cluster-aware!',</tt><tt><br>

      </tt><tt>    last-rc-change='Thu Dec  7 13:09:28 2017',

        queued=0ms, exec=250ms</tt><tt><br>

      </tt><tt>* gluster_vol_stop_0 on pcmk01-cr 'unknown error' (1):

        call=60, status=Timed Out, exitreason='none',</tt><tt><br>

      </tt><tt>    last-rc-change='Thu Dec  7 12:55:11 2017',

        queued=0ms, exec=20004ms</tt><tt><br>

      </tt><tt><br>

      </tt><tt><br>

      </tt><tt>Daemon Status:</tt><tt><br>

      </tt><tt>  corosync: active/enabled</tt><tt><br>

      </tt><tt>  pacemaker: active/enabled</tt><tt><br>

      </tt><tt>  pcsd: active/enabled</tt><br>

      <br>

    </p>

    <p>1. The data mount can't be created? Why?<br>

      2. Why is there a volume "stop" command being attempted, and why

      does it fail?<br>

      3. Why is any of this happening in standby? I can't have the

      resources failing before I've even made the node live! I could

      understand why a gluster_vol start operation would fail when

      glusterd is (correctly) stopped, but why is there a *stop*

      operation? And why does that make the resource "blocked"?<br>

    </p>

    <p>Given the above steps, is there something fundamental I'm missing

      about how these resource agents should be used? How do *you*

      configure GlusterFS on Pacemaker?</p>

    <p>Any advice appreciated.<br>

    </p>

    <p>Best regards<br>

    </p>

    <p><br>

    </p>

    <p>* <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1233344">https://bugzilla.redhat.com/show_bug.cgi?id=1233344</a></p>

    <p><br>

    </p>

  </body>

</html>