[heketi-devel] Unable to use Heketi setup to install Gluster for Kubernetes

Sat Sep 2 04:17:55 UTC 2017

Hi Jose,

I tried your suggestion but there is one confusion regarding point #3.
Since RancherOS has everything running as container, i am running
webcenter/rancher-glusterfs-server container on all three nodes. Now as far
as removing the directories are concerned, i hope you meant removing them
on the host and _not_ from within the container. After completing step 1
and 2, i checked the contents of all the directories that you specified in
point #3. All were empty as you can see in the attached *other_logs.txt*.
So i got confused. I ran the deploy again but the issue persists. Two pods
show Liveness error and the third one, Readiness error.

I then tried removing those directories (Step #3) from within the container
but getting following error:

root at c0f8ab4d92a2:/app# rm -rf /var/lib/heketi /etc/glusterfs
/var/lib/glusterd /var/log/glusterfs
rm: cannot remove '/var/lib/glusterd': Device or resource busy

On Fri, Sep 1, 2017 at 8:21 PM, Jose A. Rivera <jarrpa at redhat.com> wrote:

> 1. Add a line to the ssh-exec portion of heketi.json of the sort:
>
> "sudo": true,
>
> 2. Run
>
> gk-deploy -g --abort
>
> 3. On the nodes that were/will be running GlusterFS pods, run:
>
> rm -rf /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs
>
> Then try the deploy again.
>
> On Fri, Sep 1, 2017 at 6:05 AM, Gaurav Chhabra <varuag.chhabra at gmail.com>
> wrote:
> > Hi Jose,
> >
> >
> > Thanks for the reply. It seems the three gluster pods might have been a
> > copy-paste from another set of cluster where i was trying to setup the
> same
> > thing using CentOS. Sorry for that. By the way, i did check for the
> kernel
> > modules and it seems it's already there. Also, i am attaching fresh set
> of
> > files because i created a new cluster and thought of giving it a try
> again.
> > Issue still persists. :(
> >
> > In heketi.json, there is a slight change w.r.t the user which connects to
> > glusterfs node using SSH. I am not sure how Heketi was using root user to
> > login because i wasn't able to use root and do manual SSH. With rancher
> > user, i can login successfully so i think this should be fine.
> >
> > /etc/heketi/heketi.json:
> > ------------------------------------------------------------------
> >     "executor": "ssh",
> >
> >     "_sshexec_comment": "SSH username and private key file information",
> >     "sshexec": {
> >       "keyfile": "/var/lib/heketi/.ssh/id_rsa",
> >       "user": "rancher",
> >       "port": "22",
> >       "fstab": "/etc/fstab"
> >     },
> > ------------------------------------------------------------------
> >
> > Before running gk-deploy:
> > ------------------------------------------------------------------
> > [root at workstation deploy]# kubectl get
> > nodes,pods,daemonset,deployments,services
> > NAME                                     STATUS    AGE       VERSION
> > no/node-a.c.kubernetes-174104.internal   Ready     3h
> v1.7.2-rancher1
> > no/node-b.c.kubernetes-174104.internal   Ready     3h
> v1.7.2-rancher1
> > no/node-c.c.kubernetes-174104.internal   Ready     3h
> v1.7.2-rancher1
> >
> > NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
> > svc/kubernetes   10.43.0.1    <none>        443/TCP   3h
> > ------------------------------------------------------------------
> >
> > After running gk-deploy:
> > ------------------------------------------------------------------
> > [root at workstation messagegc]# kubectl get
> > nodes,pods,daemonset,deployments,services
> > NAME                                     STATUS    AGE       VERSION
> > no/node-a.c.kubernetes-174104.internal   Ready     3h
> v1.7.2-rancher1
> > no/node-b.c.kubernetes-174104.internal   Ready     3h
> v1.7.2-rancher1
> > no/node-c.c.kubernetes-174104.internal   Ready     3h
> v1.7.2-rancher1
> >
> > NAME                 READY     STATUS    RESTARTS   AGE
> > po/glusterfs-0j9l5   0/1       Running   0          2m
> > po/glusterfs-gqz4c   0/1       Running   0          2m
> > po/glusterfs-gxvcb   0/1       Running   0          2m
> >
> > NAME           DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE
> > NODE-SELECTOR           AGE
> > ds/glusterfs   3         3         0         3            0
> > storagenode=glusterfs   2m
> >
> > NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
> > svc/kubernetes   10.43.0.1    <none>        443/TCP   3h
> > ------------------------------------------------------------------
> >
> > Kernel module check on all three nodes:
> > ------------------------------------------------------------------
> > [root at node-a ~]# find /lib*/modules/$(uname -r) -name *.ko | grep
> > 'thin-pool\|snapshot\|mirror' | xargs ls -ltr
> > -rw-r--r--    1 root     root         92310 Jun 26 04:13
> > /lib64/modules/4.9.34-rancher/kernel/drivers/md/dm-thin-pool.ko
> > -rw-r--r--    1 root     root         56982 Jun 26 04:13
> > /lib64/modules/4.9.34-rancher/kernel/drivers/md/dm-snapshot.ko
> > -rw-r--r--    1 root     root         27070 Jun 26 04:13
> > /lib64/modules/4.9.34-rancher/kernel/drivers/md/dm-mirror.ko
> > -rw-r--r--    1 root     root         92310 Jun 26 04:13
> > /lib/modules/4.9.34-rancher/kernel/drivers/md/dm-thin-pool.ko
> > -rw-r--r--    1 root     root         56982 Jun 26 04:13
> > /lib/modules/4.9.34-rancher/kernel/drivers/md/dm-snapshot.ko
> > -rw-r--r--    1 root     root         27070 Jun 26 04:13
> > /lib/modules/4.9.34-rancher/kernel/drivers/md/dm-mirror.ko
> > ------------------------------------------------------------------
> >
> > Error snapshot attached.
> >
> > In my first mail, i checked that Readiness Probe failure check has this
> code
> > in kube-templates/glusterfs-daemonset.yaml file:
> > ------------------------------------------------------------------
> >         readinessProbe:
> >           timeoutSeconds: 3
> >           initialDelaySeconds: 40
> >           exec:
> >             command:
> >             - "/bin/bash"
> >             - "-c"
> >             - systemctl status glusterd.service
> >           periodSeconds: 25
> >           successThreshold: 1
> >           failureThreshold: 15
> > ------------------------------------------------------------------
> >
> > I tried logging into glustefs container on one of the node and ran the
> above
> > command:
> >
> > [root at node-a ~]# docker exec -it c0f8ab4d92a23b6df2 /bin/bash
> > root at c0f8ab4d92a2:/app# systemctl status glusterd.service
> > WARNING: terminal is not fully functional
> > Failed to connect to bus: No such file or directory
> >
> >
> > Any check that i can do manually on nodes to debug further? Any
> suggestions?
> >
> >
> > On Thu, Aug 31, 2017 at 6:53 PM, Jose A. Rivera <jarrpa at redhat.com>
> wrote:
> >>
> >> Hey Gaurav,
> >>
> >> The kernel modules must be loaded on all nodes that will run heketi
> >> pods. Additionally, you must have at least three nodes specified in
> >> your topology file. I'm not sure how you're getting three gluster pods
> >> when you only have two nodes defined... :)
> >>
> >> --Jose
> >>
> >> On Wed, Aug 30, 2017 at 5:27 AM, Gaurav Chhabra
> >> <varuag.chhabra at gmail.com> wrote:
> >> > Hi,
> >> >
> >> >
> >> > I have the following setup in place:
> >> >
> >> > 1 node    : RancherOS having Rancher application for Kubernetes setup
> >> > 2 nodes  : RancherOS having Rancher agent
> >> > 1 node   : CentOS 7 workstation having kubectl installed and folder
> >> > cloned/downloaded from https://github.com/gluster/gluster-kubernetes
> >> > using
> >> > which i run Heketi setup (gk-deploy -g)
> >> >
> >> > I also have rancher-glusterfs-server container running with the
> >> > following
> >> > configuration:
> >> > ------------------------------------------------------------------
> >> > [root at node-1 rancher]# cat gluster-server.sh
> >> > #!/bin/bash
> >> >
> >> > sudo docker run --name=gluster-server -d \
> >> >         --env 'SERVICE_NAME=gluster' \
> >> >         --restart always \
> >> >         --env 'GLUSTER_DATA=/srv/docker/gitlab' \
> >> >         --publish 2222:22 \
> >> >         webcenter/rancher-glusterfs-server
> >> > ------------------------------------------------------------------
> >> >
> >> > In /etc/heketi/heketi.json, following is the only modified portion:
> >> > ------------------------------------------------------------------
> >> >     "executor": "ssh",
> >> >
> >> >     "_sshexec_comment": "SSH username and private key file
> information",
> >> >     "sshexec": {
> >> >       "keyfile": "/var/lib/heketi/.ssh/id_rsa",
> >> >       "user": "root",
> >> >       "port": "22",
> >> >       "fstab": "/etc/fstab"
> >> >     },
> >> > ------------------------------------------------------------------
> >> >
> >> > Status before running gk-deploy:
> >> >
> >> > [root at workstation deploy]# kubectl get nodes,pods,services,
> deployments
> >> > NAME                                     STATUS    AGE       VERSION
> >> > no/node-1.c.kubernetes-174104.internal   Ready     2d
> >> > v1.7.2-rancher1
> >> > no/node-2.c.kubernetes-174104.internal   Ready     2d
> >> > v1.7.2-rancher1
> >> > no/node-3.c.kubernetes-174104.internal   Ready     2d
> >> > v1.7.2-rancher1
> >> >
> >> > NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
> >> > svc/kubernetes   10.43.0.1    <none>        443/TCP   2d
> >> >
> >> >
> >> > Now when i run 'gk-deploy -g', in the Rancher console, i see the
> >> > following
> >> > error:
> >> > Readiness probe failed: Failed to get D-Bus connection: Operation not
> >> > permitted
> >> >
> >> > From the attached gk-deploy_log i see that it failed at:
> >> > Waiting for GlusterFS pods to start ... pods not found.
> >> >
> >> > In the kube-templates/glusterfs-daemonset.yaml file, i see this for
> >> > Readiness probe section:
> >> > ------------------------------------------------------------------
> >> >         readinessProbe:
> >> >           timeoutSeconds: 3
> >> >           initialDelaySeconds: 40
> >> >           exec:
> >> >             command:
> >> >             - "/bin/bash"
> >> >             - "-c"
> >> >             - systemctl status glusterd.service
> >> >           periodSeconds: 25
> >> >           successThreshold: 1
> >> >           failureThreshold: 15
> >> > ------------------------------------------------------------------
> >> >
> >> >
> >> > Status after running gk-deploy:
> >> >
> >> > [root at workstation deploy]# kubectl get nodes,pods,deployments,
> services
> >> > NAME                                     STATUS    AGE       VERSION
> >> > no/node-1.c.kubernetes-174104.internal   Ready     2d
> >> > v1.7.2-rancher1
> >> > no/node-2.c.kubernetes-174104.internal   Ready     2d
> >> > v1.7.2-rancher1
> >> > no/node-3.c.kubernetes-174104.internal   Ready     2d
> >> > v1.7.2-rancher1
> >> >
> >> > NAME                 READY     STATUS    RESTARTS   AGE
> >> > po/glusterfs-0s440   0/1       Running   0          1m
> >> > po/glusterfs-j7dgr   0/1       Running   0          1m
> >> > po/glusterfs-p6jl3   0/1       Running   0          1m
> >> >
> >> > NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
> >> > svc/kubernetes   10.43.0.1    <none>        443/TCP   2d
> >> >
> >> >
> >> > Also, from prerequisite perspective, i was also seeing this mentioned:
> >> >
> >> > The following kernel modules must be loaded:
> >> >  * dm_snapshot
> >> >  * dm_mirror
> >> >  * dm_thin_pool
> >> >
> >> > Where exactly is this to be checked? On all Gluster server nodes? How
> >> > can i
> >> > check whether it's there?
> >> >
> >> > I have attached topology.json and gk-deploy log for reference.
> >> >
> >> > Does this issue has anything to do with the host OS (RancherOS) that i
> >> > am
> >> > using for Gluster nodes? Any idea how i can fix this? Any help will
> >> > really
> >> > be appreciated.
> >> >
> >> >
> >> > Thanks.
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > heketi-devel mailing list
> >> > heketi-devel at gluster.org
> >> > http://lists.gluster.org/mailman/listinfo/heketi-devel
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/heketi-devel/attachments/20170902/2741a59c/attachment-0001.html>
-------------- next part --------------
[root at workstation deploy]# ./gk-deploy -g --abort
Using Kubernetes CLI.
Using namespace "default".
Do you wish to abort the deployment?
[Y]es, [N]o? [Default: N]: N
[root at workstation deploy]#
[root at workstation deploy]#
[root at workstation deploy]#
[root at workstation deploy]# ./gk-deploy -g
Welcome to the deployment tool for GlusterFS on Kubernetes and OpenShift.

Before getting started, this script has some requirements of the execution
environment and of the container platform that you should verify.

The client machine that will run this script must have:
 * Administrative access to an existing Kubernetes or OpenShift cluster
 * Access to a python interpreter 'python'

Each of the nodes that will host GlusterFS must also have appropriate firewall
rules for the required GlusterFS ports:
 * 2222  - sshd (if running GlusterFS in a pod)
 * 24007 - GlusterFS Management
 * 24008 - GlusterFS RDMA
 * 49152 to 49251 - Each brick for every volume on the host requires its own
   port. For every new brick, one new port will be used starting at 49152. We
   recommend a default range of 49152-49251 on each host, though you can adjust
   this to fit your needs.

The following kernel modules must be loaded:
 * dm_snapshot
 * dm_mirror
 * dm_thin_pool

For systems with SELinux, the following settings need to be considered:
 * virt_sandbox_use_fusefs should be enabled on each node to allow writing to
   remote GlusterFS volumes

In addition, for an OpenShift deployment you must:
 * Have 'cluster_admin' role on the administrative account doing the deployment
 * Add the 'default' and 'router' Service Accounts to the 'privileged' SCC
 * Have a router deployed that is configured to allow apps to access services
   running in the cluster

Do you wish to proceed with deployment?

[Y]es, [N]o? [Default: Y]: Y
Using Kubernetes CLI.
Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ... not found.
  deploy-heketi pod ... not found.
  heketi pod ... not found.
Creating initial resources ... Error from server (AlreadyExists): error when creating "/root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists
clusterrolebinding "heketi-sa-view" not labeled
OK
error: 'storagenode' already has a value (glusterfs), and --overwrite is false
Failed to label node 'node-a.c.kubernetes-174104.internal'
[root at workstation deploy]#
[root at workstation deploy]#
[root at workstation deploy]# kubectl label nodes node-a.c.kubernetes-174104.internal storagenode-
node "node-a.c.kubernetes-174104.internal" labeled
[root at workstation deploy]# kubectl label nodes node-b.c.kubernetes-174104.internal storagenode-
node "node-b.c.kubernetes-174104.internal" labeled
[root at workstation deploy]# kubectl label nodes node-c.c.kubernetes-174104.internal storagenode-
node "node-c.c.kubernetes-174104.internal" labeled
[root at workstation deploy]#
[root at workstation deploy]#
[root at workstation deploy]# ./gk-deploy -g
Welcome to the deployment tool for GlusterFS on Kubernetes and OpenShift.

Before getting started, this script has some requirements of the execution
environment and of the container platform that you should verify.

The client machine that will run this script must have:
 * Administrative access to an existing Kubernetes or OpenShift cluster
 * Access to a python interpreter 'python'

Each of the nodes that will host GlusterFS must also have appropriate firewall
rules for the required GlusterFS ports:
 * 2222  - sshd (if running GlusterFS in a pod)
 * 24007 - GlusterFS Management
 * 24008 - GlusterFS RDMA
 * 49152 to 49251 - Each brick for every volume on the host requires its own
   port. For every new brick, one new port will be used starting at 49152. We
   recommend a default range of 49152-49251 on each host, though you can adjust
   this to fit your needs.

The following kernel modules must be loaded:
 * dm_snapshot
 * dm_mirror
 * dm_thin_pool

For systems with SELinux, the following settings need to be considered:
 * virt_sandbox_use_fusefs should be enabled on each node to allow writing to
   remote GlusterFS volumes

In addition, for an OpenShift deployment you must:
 * Have 'cluster_admin' role on the administrative account doing the deployment
 * Add the 'default' and 'router' Service Accounts to the 'privileged' SCC
 * Have a router deployed that is configured to allow apps to access services
   running in the cluster

Do you wish to proceed with deployment?

[Y]es, [N]o? [Default: Y]:
Using Kubernetes CLI.
Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ... not found.
  deploy-heketi pod ... not found.
  heketi pod ... not found.
Creating initial resources ... Error from server (AlreadyExists): error when creating "/root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists
clusterrolebinding "heketi-sa-view" not labeled
OK
node "node-a.c.kubernetes-174104.internal" labeled
node "node-b.c.kubernetes-174104.internal" labeled
node "node-c.c.kubernetes-174104.internal" labeled
Error from server (AlreadyExists): error when creating "STDIN": daemonsets.extensions "glusterfs" already exists
Waiting for GlusterFS pods to start ... pods not found.
[root at workstation deploy]#
-------------- next part --------------
[root at node-a ~]# ls -l /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs
/etc/glusterfs:
total 0

/var/lib/glusterd:
total 0

/var/lib/heketi:
total 0

/var/log/glusterfs:
total 0

[root at node-a ~]# docker ps | grep -i gluster
f59ea806ad03        gluster/gluster-centos at sha256:e3e2881af497bbd76e4d3de90a4359d8167aa8410db2c66196f0b99df6067cb2   "/usr/sbin/init"         2 minutes ago       Up 2 minutes                               k8s_glusterfs_glusterfs-gxvcb_default_b5775350-8f00-11e7-8f35-023b4cbc4fdd_22
6573b7eaaa3c        gcr.io/google_containers/pause-amd64:3.0                                                         "/pause"                 22 minutes ago      Up 22 minutes                              k8s_POD_glusterfs-gxvcb_default_b5775350-8f00-11e7-8f35-023b4cbc4fdd_1
c0f8ab4d92a2        webcenter/rancher-glusterfs-server                                                               "/app/run"               17 hours ago        Up 26 minutes       0.0.0.0:2222->22/tcp   gluster-server

[root at node-a ~]# docker exec -it gluster-server /bin/bash
root at c0f8ab4d92a2:/app# ls -l /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs
ls: cannot access '/var/lib/heketi': No such file or directory
/etc/glusterfs:
total 4
-rw-r--r-- 10 root root 358 Apr 18  2016 glusterd.vol

/var/lib/glusterd:
total 52
drwxr-xr-x 2 root root 4096 Sep  1 10:03 bitd
drwxr-xr-x 2 root root 4096 Sep  2 03:17 geo-replication
drwxr-xr-x 2 root root 4096 Sep  1 10:03 glustershd
drwxr-xr-x 2 root root 4096 Sep  1 10:03 groups
drwxr-xr-x 3 root root 4096 Sep  1 10:03 hooks
drwxr-xr-x 2 root root 4096 Sep  1 10:03 nfs
-rw------- 1 root root   24 Sep  1 10:03 options
drwxr-xr-x 2 root root 4096 Sep  1 10:03 peers
drwxr-xr-x 2 root root 4096 Sep  1 10:03 quotad
drwxr-xr-x 2 root root 4096 Sep  1 10:03 scrub
drwxr-xr-x 2 root root 4096 Sep  1 10:03 snaps
drwxr-xr-x 2 root root 4096 Sep  1 10:03 ss_brick
drwxr-xr-x 2 root root 4096 Sep  1 10:03 vols

/var/log/glusterfs:
total 20
drwxr-xr-x 2 root root 4096 Sep  1 10:03 bricks
-rw------- 1 root root    0 Sep  1 10:03 cmd_history.log
-rw------- 1 root root 5595 Sep  2 03:17 etc-glusterfs-glusterd.vol.log
drwxr-xr-x 2 root root 4096 Sep  1 10:03 geo-replication
drwxr-xr-x 3 root root 4096 Sep  1 10:03 geo-replication-slaves
-------------- next part --------------
A non-text attachment was scrubbed...
Name: etc-glusterfs-glusterd.vol.log
Type: application/octet-stream
Size: 5733 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/heketi-devel/attachments/20170902/2741a59c/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Readiness & Liveness Probe Failed Error.png
Type: image/png
Size: 35108 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/heketi-devel/attachments/20170902/2741a59c/attachment-0001.png>