[heketi-devel] Unable to use Heketi setup to install Gluster for Kubernetes

Sat Sep 2 20:41:18 UTC 2017

Hey, no problem! I'm eager to learn more about different flavors of
Linux, I just apologize for my relative inexperience with them. :)

To that end, I will also admit I'm not very experienced with direct
Docker myself. I understand the basic workflow and know some of the
run options, but not having deep experience keeps me from having a
better understanding of the patterns and consequences.

Thus, I'd like to guide you in a direction I'd more apt to help you in
right now. I know that you can't have multiple GlusterFS servers
running on the same nodes, and I know that we have been successfully
running several configurations using our gluster/gluster-centos image.
If you follow the Kubernetes configuration on gluster-kubernetes, the
pod/container is run privileged and with host networking, and we
require that the node has all listed ports open, not just 2222. The
sshd running in the container is listening on 2222, not 22, but it is
also not really required if you're not doing geo-replication.

Alternatively, you can indeed run GlusterFS outside of Kubernetes but
still have Kubernetes apps access GlusterFS storage. The nodes can be
anything you want, they just need to be running GlusterFS and you need
a heketi service managing them. Here is an example of how to set this
up using CentOS:

https://github.com/gluster/gluster-kubernetes/tree/master/docs/examples/dynamic_provisioning_external_gluster

Hope this is at least leading you in a useful direction. :)

--Jose

On Sat, Sep 2, 2017 at 3:16 PM, Gaurav Chhabra <varuag.chhabra at gmail.com> wrote:
> Hi Jose,
>
> webcenter/rancher-glusterfs-server is actually a container provided by
> Sebastien, its maintainer. It's a Docker container which has GlusterFS
> server running within it. On the host i.e., RancherOS, there is no separate
> GlusterFS server running because we cannot install anything that way.
> Running using container is the only way so i started ancher-glusterfs-server
> container with the following parameters:
>
> [root at node-1 rancher]# cat gluster-server.sh
> #!/bin/bash
> sudo docker run --name=gluster-server -d \
>         --env 'SERVICE_NAME=gluster' \
>         --restart always \
>         --publish 2222:22 \
>         webcenter/rancher-glusterfs-server
>
> Here's the link to the dockerfile:
> https://hub.docker.com/r/webcenter/rancher-glusterfs-server/~/dockerfile/
>
> It's similar to other GlusteFS containers provided by other maintainers for
> different OS. For example, for CentOS, we have
> https://hub.docker.com/r/gluster/gluster-centos/~/dockerfile/
>
> From what i understand, Heketi does support container based GlusterFS server
> as mentioned in the prerequisite where it says:
>
> "Each node must have the following ports opened for GlusterFS
> communications:
>  2222 - GlusterFS pod's sshd"
>
> That's the reason i've exposed port 2222 for 22 as shown above. Please
> correct me if i misunderstood it.
>
> As soon as i run the above script (gluster-server.sh), it automatically
> creates the following directories on host. This should have ideally not been
> empty as you mentioned.
>
> /etc/glusterfs    /var/lib/glusterd    /var/log/glusterfs
>
> Just wanted to know in which circumstances do we get this specific error
> (Failed to get D-Bus connection: Operation not permitted) related to
> Readiness probe failing. Searching online took me to discussions around
> running container in privileged mode and some directory to be mounted. Based
> on that, i also modified my container startup script to the following:
>
> #!/bin/bash
> sudo docker run --privileged \
>         --name=gluster-server \
>         -d \
>         -v /sys/fs/cgroup:/sys/fs/cgroup \
>         -v /etc/glusterfs:/etc/glusterfs \
>         -v /var/lib/glusterd:/var/lib/glusterd \
>         -v /var/log/glusterfs:/var/log/glusterfs \
>         --env 'SERVICE_NAME=gluster' \
>         --restart always \
>         --publish 2222:22 \
>         webcenter/rancher-glusterfs-server
> Still, the issue persists.
>
> I also logged into the container and checked whether systemctl command is
> present. It was there but manualy running the command also doesn't work:
>
> [root at node-c ~]# docker exec -it gluster-server /bin/bash
> root at 42150f203f80:/app# systemctl status glusterd.service
> WARNING: terminal is not fully functional
> Failed to connect to bus: No such file or directory
>
> Under section 'ADVANCED OPTIONS - Security/Host' in this link, it talks
> about SYS_ADMIN setting. Any idea how i can try this?
>
> Also, there was this mentioned in the Heketi setup page:
> "If you are not able to deploy a hyper-converged GlusterFS cluster, you must
> have one running somewhere that the Kubernetes nodes can access"
>
>>>> Does it mean running the three node Gluster cluster outside Kubernetes,
>>>> may be on some VM running on RHEL/CentOS etc? If yes, then how will i be
>>>> able to tell Gluster which volume from the Kubernetes cluster pod to sync?
>>>> Any references?
>
>
> I really appreciate your responses despite the fact that you've not used
> RancherOS but still trying to help.
>
>
> Thanks,
> Gaurav
>
>
> On Sat, Sep 2, 2017 at 7:35 PM, Jose A. Rivera <jarrpa at redhat.com> wrote:
>>
>> I'm afraid I have no experience with RancherOS, so I may be missing
>> some things about how it works. My primary experience is with Fedora,
>> CentOS, and Ubuntu.
>>
>> What is webcenter/rancher-glusterfs-server? If it's running another
>> glusterd then you probably don't want to be running it and should
>> remove it from your systems.
>>
>> The glusterfs pods mount hostpath volumes from the host they're
>> running on to persist their configuration. Thus anything they write to
>> those directories should land on the host. If that's not happening
>> then that's an additional problem.
>>
>> --Jose
>>
>> On Fri, Sep 1, 2017 at 11:17 PM, Gaurav Chhabra
>> <varuag.chhabra at gmail.com> wrote:
>> > Hi Jose,
>> >
>> >
>> > I tried your suggestion but there is one confusion regarding point #3.
>> > Since
>> > RancherOS has everything running as container, i am running
>> > webcenter/rancher-glusterfs-server container on all three nodes. Now as
>> > far
>> > as removing the directories are concerned, i hope you meant removing
>> > them on
>> > the host and _not_ from within the container. After completing step 1
>> > and 2,
>> > i checked the contents of all the directories that you specified in
>> > point
>> > #3. All were empty as you can see in the attached other_logs.txt. So i
>> > got
>> > confused. I ran the deploy again but the issue persists. Two pods show
>> > Liveness error and the third one, Readiness error.
>> >
>> > I then tried removing those directories (Step #3) from within the
>> > container
>> > but getting following error:
>> >
>> > root at c0f8ab4d92a2:/app# rm -rf /var/lib/heketi /etc/glusterfs
>> > /var/lib/glusterd /var/log/glusterfs
>> > rm: cannot remove '/var/lib/glusterd': Device or resource busy
>> >
>> >
>> >
>> > On Fri, Sep 1, 2017 at 8:21 PM, Jose A. Rivera <jarrpa at redhat.com>
>> > wrote:
>> >>
>> >> 1. Add a line to the ssh-exec portion of heketi.json of the sort:
>> >>
>> >> "sudo": true,
>> >>
>> >> 2. Run
>> >>
>> >> gk-deploy -g --abort
>> >>
>> >> 3. On the nodes that were/will be running GlusterFS pods, run:
>> >>
>> >> rm -rf /var/lib/heketi /etc/glusterfs /var/lib/glusterd
>> >> /var/log/glusterfs
>> >>
>> >> Then try the deploy again.
>> >>
>> >> On Fri, Sep 1, 2017 at 6:05 AM, Gaurav Chhabra
>> >> <varuag.chhabra at gmail.com>
>> >> wrote:
>> >> > Hi Jose,
>> >> >
>> >> >
>> >> > Thanks for the reply. It seems the three gluster pods might have been
>> >> > a
>> >> > copy-paste from another set of cluster where i was trying to setup
>> >> > the
>> >> > same
>> >> > thing using CentOS. Sorry for that. By the way, i did check for the
>> >> > kernel
>> >> > modules and it seems it's already there. Also, i am attaching fresh
>> >> > set
>> >> > of
>> >> > files because i created a new cluster and thought of giving it a try
>> >> > again.
>> >> > Issue still persists. :(
>> >> >
>> >> > In heketi.json, there is a slight change w.r.t the user which
>> >> > connects
>> >> > to
>> >> > glusterfs node using SSH. I am not sure how Heketi was using root
>> >> > user
>> >> > to
>> >> > login because i wasn't able to use root and do manual SSH. With
>> >> > rancher
>> >> > user, i can login successfully so i think this should be fine.
>> >> >
>> >> > /etc/heketi/heketi.json:
>> >> > ------------------------------------------------------------------
>> >> >     "executor": "ssh",
>> >> >
>> >> >     "_sshexec_comment": "SSH username and private key file
>> >> > information",
>> >> >     "sshexec": {
>> >> >       "keyfile": "/var/lib/heketi/.ssh/id_rsa",
>> >> >       "user": "rancher",
>> >> >       "port": "22",
>> >> >       "fstab": "/etc/fstab"
>> >> >     },
>> >> > ------------------------------------------------------------------
>> >> >
>> >> > Before running gk-deploy:
>> >> > ------------------------------------------------------------------
>> >> > [root at workstation deploy]# kubectl get
>> >> > nodes,pods,daemonset,deployments,services
>> >> > NAME                                     STATUS    AGE       VERSION
>> >> > no/node-a.c.kubernetes-174104.internal   Ready     3h
>> >> > v1.7.2-rancher1
>> >> > no/node-b.c.kubernetes-174104.internal   Ready     3h
>> >> > v1.7.2-rancher1
>> >> > no/node-c.c.kubernetes-174104.internal   Ready     3h
>> >> > v1.7.2-rancher1
>> >> >
>> >> > NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
>> >> > svc/kubernetes   10.43.0.1    <none>        443/TCP   3h
>> >> > ------------------------------------------------------------------
>> >> >
>> >> > After running gk-deploy:
>> >> > ------------------------------------------------------------------
>> >> > [root at workstation messagegc]# kubectl get
>> >> > nodes,pods,daemonset,deployments,services
>> >> > NAME                                     STATUS    AGE       VERSION
>> >> > no/node-a.c.kubernetes-174104.internal   Ready     3h
>> >> > v1.7.2-rancher1
>> >> > no/node-b.c.kubernetes-174104.internal   Ready     3h
>> >> > v1.7.2-rancher1
>> >> > no/node-c.c.kubernetes-174104.internal   Ready     3h
>> >> > v1.7.2-rancher1
>> >> >
>> >> > NAME                 READY     STATUS    RESTARTS   AGE
>> >> > po/glusterfs-0j9l5   0/1       Running   0          2m
>> >> > po/glusterfs-gqz4c   0/1       Running   0          2m
>> >> > po/glusterfs-gxvcb   0/1       Running   0          2m
>> >> >
>> >> > NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE
>> >> > NODE-SELECTOR           AGE
>> >> > ds/glusterfs   3         3         0         3            0
>> >> > storagenode=glusterfs   2m
>> >> >
>> >> > NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
>> >> > svc/kubernetes   10.43.0.1    <none>        443/TCP   3h
>> >> > ------------------------------------------------------------------
>> >> >
>> >> > Kernel module check on all three nodes:
>> >> > ------------------------------------------------------------------
>> >> > [root at node-a ~]# find /lib*/modules/$(uname -r) -name *.ko | grep
>> >> > 'thin-pool\|snapshot\|mirror' | xargs ls -ltr
>> >> > -rw-r--r--    1 root     root         92310 Jun 26 04:13
>> >> > /lib64/modules/4.9.34-rancher/kernel/drivers/md/dm-thin-pool.ko
>> >> > -rw-r--r--    1 root     root         56982 Jun 26 04:13
>> >> > /lib64/modules/4.9.34-rancher/kernel/drivers/md/dm-snapshot.ko
>> >> > -rw-r--r--    1 root     root         27070 Jun 26 04:13
>> >> > /lib64/modules/4.9.34-rancher/kernel/drivers/md/dm-mirror.ko
>> >> > -rw-r--r--    1 root     root         92310 Jun 26 04:13
>> >> > /lib/modules/4.9.34-rancher/kernel/drivers/md/dm-thin-pool.ko
>> >> > -rw-r--r--    1 root     root         56982 Jun 26 04:13
>> >> > /lib/modules/4.9.34-rancher/kernel/drivers/md/dm-snapshot.ko
>> >> > -rw-r--r--    1 root     root         27070 Jun 26 04:13
>> >> > /lib/modules/4.9.34-rancher/kernel/drivers/md/dm-mirror.ko
>> >> > ------------------------------------------------------------------
>> >> >
>> >> > Error snapshot attached.
>> >> >
>> >> > In my first mail, i checked that Readiness Probe failure check has
>> >> > this
>> >> > code
>> >> > in kube-templates/glusterfs-daemonset.yaml file:
>> >> > ------------------------------------------------------------------
>> >> >         readinessProbe:
>> >> >           timeoutSeconds: 3
>> >> >           initialDelaySeconds: 40
>> >> >           exec:
>> >> >             command:
>> >> >             - "/bin/bash"
>> >> >             - "-c"
>> >> >             - systemctl status glusterd.service
>> >> >           periodSeconds: 25
>> >> >           successThreshold: 1
>> >> >           failureThreshold: 15
>> >> > ------------------------------------------------------------------
>> >> >
>> >> > I tried logging into glustefs container on one of the node and ran
>> >> > the
>> >> > above
>> >> > command:
>> >> >
>> >> > [root at node-a ~]# docker exec -it c0f8ab4d92a23b6df2 /bin/bash
>> >> > root at c0f8ab4d92a2:/app# systemctl status glusterd.service
>> >> > WARNING: terminal is not fully functional
>> >> > Failed to connect to bus: No such file or directory
>> >> >
>> >> >
>> >> > Any check that i can do manually on nodes to debug further? Any
>> >> > suggestions?
>> >> >
>> >> >
>> >> > On Thu, Aug 31, 2017 at 6:53 PM, Jose A. Rivera <jarrpa at redhat.com>
>> >> > wrote:
>> >> >>
>> >> >> Hey Gaurav,
>> >> >>
>> >> >> The kernel modules must be loaded on all nodes that will run heketi
>> >> >> pods. Additionally, you must have at least three nodes specified in
>> >> >> your topology file. I'm not sure how you're getting three gluster
>> >> >> pods
>> >> >> when you only have two nodes defined... :)
>> >> >>
>> >> >> --Jose
>> >> >>
>> >> >> On Wed, Aug 30, 2017 at 5:27 AM, Gaurav Chhabra
>> >> >> <varuag.chhabra at gmail.com> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> >
>> >> >> > I have the following setup in place:
>> >> >> >
>> >> >> > 1 node    : RancherOS having Rancher application for Kubernetes
>> >> >> > setup
>> >> >> > 2 nodes  : RancherOS having Rancher agent
>> >> >> > 1 node   : CentOS 7 workstation having kubectl installed and
>> >> >> > folder
>> >> >> > cloned/downloaded from
>> >> >> > https://github.com/gluster/gluster-kubernetes
>> >> >> > using
>> >> >> > which i run Heketi setup (gk-deploy -g)
>> >> >> >
>> >> >> > I also have rancher-glusterfs-server container running with the
>> >> >> > following
>> >> >> > configuration:
>> >> >> > ------------------------------------------------------------------
>> >> >> > [root at node-1 rancher]# cat gluster-server.sh
>> >> >> > #!/bin/bash
>> >> >> >
>> >> >> > sudo docker run --name=gluster-server -d \
>> >> >> >         --env 'SERVICE_NAME=gluster' \
>> >> >> >         --restart always \
>> >> >> >         --env 'GLUSTER_DATA=/srv/docker/gitlab' \
>> >> >> >         --publish 2222:22 \
>> >> >> >         webcenter/rancher-glusterfs-server
>> >> >> > ------------------------------------------------------------------
>> >> >> >
>> >> >> > In /etc/heketi/heketi.json, following is the only modified
>> >> >> > portion:
>> >> >> > ------------------------------------------------------------------
>> >> >> >     "executor": "ssh",
>> >> >> >
>> >> >> >     "_sshexec_comment": "SSH username and private key file
>> >> >> > information",
>> >> >> >     "sshexec": {
>> >> >> >       "keyfile": "/var/lib/heketi/.ssh/id_rsa",
>> >> >> >       "user": "root",
>> >> >> >       "port": "22",
>> >> >> >       "fstab": "/etc/fstab"
>> >> >> >     },
>> >> >> > ------------------------------------------------------------------
>> >> >> >
>> >> >> > Status before running gk-deploy:
>> >> >> >
>> >> >> > [root at workstation deploy]# kubectl get
>> >> >> > nodes,pods,services,deployments
>> >> >> > NAME                                     STATUS    AGE
>> >> >> > VERSION
>> >> >> > no/node-1.c.kubernetes-174104.internal   Ready     2d
>> >> >> > v1.7.2-rancher1
>> >> >> > no/node-2.c.kubernetes-174104.internal   Ready     2d
>> >> >> > v1.7.2-rancher1
>> >> >> > no/node-3.c.kubernetes-174104.internal   Ready     2d
>> >> >> > v1.7.2-rancher1
>> >> >> >
>> >> >> > NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
>> >> >> > svc/kubernetes   10.43.0.1    <none>        443/TCP   2d
>> >> >> >
>> >> >> >
>> >> >> > Now when i run 'gk-deploy -g', in the Rancher console, i see the
>> >> >> > following
>> >> >> > error:
>> >> >> > Readiness probe failed: Failed to get D-Bus connection: Operation
>> >> >> > not
>> >> >> > permitted
>> >> >> >
>> >> >> > From the attached gk-deploy_log i see that it failed at:
>> >> >> > Waiting for GlusterFS pods to start ... pods not found.
>> >> >> >
>> >> >> > In the kube-templates/glusterfs-daemonset.yaml file, i see this
>> >> >> > for
>> >> >> > Readiness probe section:
>> >> >> > ------------------------------------------------------------------
>> >> >> >         readinessProbe:
>> >> >> >           timeoutSeconds: 3
>> >> >> >           initialDelaySeconds: 40
>> >> >> >           exec:
>> >> >> >             command:
>> >> >> >             - "/bin/bash"
>> >> >> >             - "-c"
>> >> >> >             - systemctl status glusterd.service
>> >> >> >           periodSeconds: 25
>> >> >> >           successThreshold: 1
>> >> >> >           failureThreshold: 15
>> >> >> > ------------------------------------------------------------------
>> >> >> >
>> >> >> >
>> >> >> > Status after running gk-deploy:
>> >> >> >
>> >> >> > [root at workstation deploy]# kubectl get
>> >> >> > nodes,pods,deployments,services
>> >> >> > NAME                                     STATUS    AGE
>> >> >> > VERSION
>> >> >> > no/node-1.c.kubernetes-174104.internal   Ready     2d
>> >> >> > v1.7.2-rancher1
>> >> >> > no/node-2.c.kubernetes-174104.internal   Ready     2d
>> >> >> > v1.7.2-rancher1
>> >> >> > no/node-3.c.kubernetes-174104.internal   Ready     2d
>> >> >> > v1.7.2-rancher1
>> >> >> >
>> >> >> > NAME                 READY     STATUS    RESTARTS   AGE
>> >> >> > po/glusterfs-0s440   0/1       Running   0          1m
>> >> >> > po/glusterfs-j7dgr   0/1       Running   0          1m
>> >> >> > po/glusterfs-p6jl3   0/1       Running   0          1m
>> >> >> >
>> >> >> > NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
>> >> >> > svc/kubernetes   10.43.0.1    <none>        443/TCP   2d
>> >> >> >
>> >> >> >
>> >> >> > Also, from prerequisite perspective, i was also seeing this
>> >> >> > mentioned:
>> >> >> >
>> >> >> > The following kernel modules must be loaded:
>> >> >> >  * dm_snapshot
>> >> >> >  * dm_mirror
>> >> >> >  * dm_thin_pool
>> >> >> >
>> >> >> > Where exactly is this to be checked? On all Gluster server nodes?
>> >> >> > How
>> >> >> > can i
>> >> >> > check whether it's there?
>> >> >> >
>> >> >> > I have attached topology.json and gk-deploy log for reference.
>> >> >> >
>> >> >> > Does this issue has anything to do with the host OS (RancherOS)
>> >> >> > that
>> >> >> > i
>> >> >> > am
>> >> >> > using for Gluster nodes? Any idea how i can fix this? Any help
>> >> >> > will
>> >> >> > really
>> >> >> > be appreciated.
>> >> >> >
>> >> >> >
>> >> >> > Thanks.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > heketi-devel mailing list
>> >> >> > heketi-devel at gluster.org
>> >> >> > http://lists.gluster.org/mailman/listinfo/heketi-devel
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>