<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif">Hi Jose,</div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default"><font face="verdana, sans-serif">I tried your suggestion but there is one confusion regarding point #3. Since RancherOS has everything running as container, i am running </font><span style="font-family:"trebuchet ms",sans-serif;font-size:12.8px">webcenter/rancher-glusterfs-se</span><wbr style="font-size:12.8px"><span style="font-size:12.8px"><font face="trebuchet ms, sans-serif">rver </font><font face="verdana, sans-serif">container on all three nodes. Now as far as removing the directories are concerned, i hope you meant removing them on the host and _not_ from within the container. After completing step 1 and 2, i checked the contents of all the directories that you specified in point #3. All were empty as you can see in the attached </font></span><font face="verdana, sans-serif"><span style="font-size:12.8px"><i>other_logs.txt</i>. So i got confused. I ran the deploy again but the issue persists. Two pods show Liveness error and the third one, Readiness error.</span></font></div><div class="gmail_default"><font face="verdana, sans-serif"><span style="font-size:12.8px"><br></span></font></div><div class="gmail_default"><font face="verdana, sans-serif"><span style="font-size:12.8px">I then tried removing those directories (Step #3) from within the container but getting following error:</span></font></div><div class="gmail_default"><font face="verdana, sans-serif"><span style="font-size:12.8px"><br></span></font></div><div class="gmail_default">root@c0f8ab4d92a2:/app# rm -rf /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs</div><div class="gmail_default">rm: cannot remove '/var/lib/glusterd': Device or resource busy<font face="verdana, sans-serif"><span style="font-size:12.8px"> </span></font></div><div class="gmail_default"><span style="font-size:12.8px"><font face="verdana, sans-serif"><br></font></span></div><div class="gmail_default"><span style="font-size:12.8px"><font face="verdana, sans-serif"><br></font></span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 1, 2017 at 8:21 PM, Jose A. Rivera <span dir="ltr"><<a href="mailto:jarrpa@redhat.com" target="_blank">jarrpa@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">1. Add a line to the ssh-exec portion of heketi.json of the sort:<br>
<br>
"sudo": true,<br>
<br>
2. Run<br>
<br>
gk-deploy -g --abort<br>
<br>
3. On the nodes that were/will be running GlusterFS pods, run:<br>
<br>
rm -rf /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs<br>
<br>
Then try the deploy again.<br>
<div class="HOEnZb"><div class="h5"><br>
On Fri, Sep 1, 2017 at 6:05 AM, Gaurav Chhabra <<a href="mailto:varuag.chhabra@gmail.com">varuag.chhabra@gmail.com</a>> wrote:<br>
> Hi Jose,<br>
><br>
><br>
> Thanks for the reply. It seems the three gluster pods might have been a<br>
> copy-paste from another set of cluster where i was trying to setup the same<br>
> thing using CentOS. Sorry for that. By the way, i did check for the kernel<br>
> modules and it seems it's already there. Also, i am attaching fresh set of<br>
> files because i created a new cluster and thought of giving it a try again.<br>
> Issue still persists. :(<br>
><br>
> In heketi.json, there is a slight change w.r.t the user which connects to<br>
> glusterfs node using SSH. I am not sure how Heketi was using root user to<br>
> login because i wasn't able to use root and do manual SSH. With rancher<br>
> user, i can login successfully so i think this should be fine.<br>
><br>
> /etc/heketi/heketi.json:<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
> "executor": "ssh",<br>
><br>
> "_sshexec_comment": "SSH username and private key file information",<br>
> "sshexec": {<br>
> "keyfile": "/var/lib/heketi/.ssh/id_rsa",<br>
> "user": "rancher",<br>
> "port": "22",<br>
> "fstab": "/etc/fstab"<br>
> },<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
><br>
> Before running gk-deploy:<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
> [root@workstation deploy]# kubectl get<br>
> nodes,pods,daemonset,<wbr>deployments,services<br>
> NAME STATUS AGE VERSION<br>
> no/node-a.c.kubernetes-174104.<wbr>internal Ready 3h v1.7.2-rancher1<br>
> no/node-b.c.kubernetes-174104.<wbr>internal Ready 3h v1.7.2-rancher1<br>
> no/node-c.c.kubernetes-174104.<wbr>internal Ready 3h v1.7.2-rancher1<br>
><br>
> NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE<br>
> svc/kubernetes 10.43.0.1 <none> 443/TCP 3h<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
><br>
> After running gk-deploy:<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
> [root@workstation messagegc]# kubectl get<br>
> nodes,pods,daemonset,<wbr>deployments,services<br>
> NAME STATUS AGE VERSION<br>
> no/node-a.c.kubernetes-174104.<wbr>internal Ready 3h v1.7.2-rancher1<br>
> no/node-b.c.kubernetes-174104.<wbr>internal Ready 3h v1.7.2-rancher1<br>
> no/node-c.c.kubernetes-174104.<wbr>internal Ready 3h v1.7.2-rancher1<br>
><br>
> NAME READY STATUS RESTARTS AGE<br>
> po/glusterfs-0j9l5 0/1 Running 0 2m<br>
> po/glusterfs-gqz4c 0/1 Running 0 2m<br>
> po/glusterfs-gxvcb 0/1 Running 0 2m<br>
><br>
> NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE<br>
> NODE-SELECTOR AGE<br>
> ds/glusterfs 3 3 0 3 0<br>
> storagenode=glusterfs 2m<br>
><br>
> NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE<br>
> svc/kubernetes 10.43.0.1 <none> 443/TCP 3h<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
><br>
> Kernel module check on all three nodes:<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
> [root@node-a ~]# find /lib*/modules/$(uname -r) -name *.ko | grep<br>
> 'thin-pool\|snapshot\|mirror' | xargs ls -ltr<br>
> -rw-r--r-- 1 root root 92310 Jun 26 04:13<br>
> /lib64/modules/4.9.34-rancher/<wbr>kernel/drivers/md/dm-thin-<wbr>pool.ko<br>
> -rw-r--r-- 1 root root 56982 Jun 26 04:13<br>
> /lib64/modules/4.9.34-rancher/<wbr>kernel/drivers/md/dm-snapshot.<wbr>ko<br>
> -rw-r--r-- 1 root root 27070 Jun 26 04:13<br>
> /lib64/modules/4.9.34-rancher/<wbr>kernel/drivers/md/dm-mirror.ko<br>
> -rw-r--r-- 1 root root 92310 Jun 26 04:13<br>
> /lib/modules/4.9.34-rancher/<wbr>kernel/drivers/md/dm-thin-<wbr>pool.ko<br>
> -rw-r--r-- 1 root root 56982 Jun 26 04:13<br>
> /lib/modules/4.9.34-rancher/<wbr>kernel/drivers/md/dm-snapshot.<wbr>ko<br>
> -rw-r--r-- 1 root root 27070 Jun 26 04:13<br>
> /lib/modules/4.9.34-rancher/<wbr>kernel/drivers/md/dm-mirror.ko<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
><br>
> Error snapshot attached.<br>
><br>
> In my first mail, i checked that Readiness Probe failure check has this code<br>
> in kube-templates/glusterfs-<wbr>daemonset.yaml file:<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
> readinessProbe:<br>
> timeoutSeconds: 3<br>
> initialDelaySeconds: 40<br>
> exec:<br>
> command:<br>
> - "/bin/bash"<br>
> - "-c"<br>
> - systemctl status glusterd.service<br>
> periodSeconds: 25<br>
> successThreshold: 1<br>
> failureThreshold: 15<br>
> ------------------------------<wbr>------------------------------<wbr>------<br>
><br>
> I tried logging into glustefs container on one of the node and ran the above<br>
> command:<br>
><br>
> [root@node-a ~]# docker exec -it c0f8ab4d92a23b6df2 /bin/bash<br>
> root@c0f8ab4d92a2:/app# systemctl status glusterd.service<br>
> WARNING: terminal is not fully functional<br>
> Failed to connect to bus: No such file or directory<br>
><br>
><br>
> Any check that i can do manually on nodes to debug further? Any suggestions?<br>
><br>
><br>
> On Thu, Aug 31, 2017 at 6:53 PM, Jose A. Rivera <<a href="mailto:jarrpa@redhat.com">jarrpa@redhat.com</a>> wrote:<br>
>><br>
>> Hey Gaurav,<br>
>><br>
>> The kernel modules must be loaded on all nodes that will run heketi<br>
>> pods. Additionally, you must have at least three nodes specified in<br>
>> your topology file. I'm not sure how you're getting three gluster pods<br>
>> when you only have two nodes defined... :)<br>
>><br>
>> --Jose<br>
>><br>
>> On Wed, Aug 30, 2017 at 5:27 AM, Gaurav Chhabra<br>
>> <<a href="mailto:varuag.chhabra@gmail.com">varuag.chhabra@gmail.com</a>> wrote:<br>
>> > Hi,<br>
>> ><br>
>> ><br>
>> > I have the following setup in place:<br>
>> ><br>
>> > 1 node : RancherOS having Rancher application for Kubernetes setup<br>
>> > 2 nodes : RancherOS having Rancher agent<br>
>> > 1 node : CentOS 7 workstation having kubectl installed and folder<br>
>> > cloned/downloaded from <a href="https://github.com/gluster/gluster-kubernetes" rel="noreferrer" target="_blank">https://github.com/gluster/<wbr>gluster-kubernetes</a><br>
>> > using<br>
>> > which i run Heketi setup (gk-deploy -g)<br>
>> ><br>
>> > I also have rancher-glusterfs-server container running with the<br>
>> > following<br>
>> > configuration:<br>
>> > ------------------------------<wbr>------------------------------<wbr>------<br>
>> > [root@node-1 rancher]# cat gluster-server.sh<br>
>> > #!/bin/bash<br>
>> ><br>
>> > sudo docker run --name=gluster-server -d \<br>
>> > --env 'SERVICE_NAME=gluster' \<br>
>> > --restart always \<br>
>> > --env 'GLUSTER_DATA=/srv/docker/<wbr>gitlab' \<br>
>> > --publish 2222:22 \<br>
>> > webcenter/rancher-glusterfs-<wbr>server<br>
>> > ------------------------------<wbr>------------------------------<wbr>------<br>
>> ><br>
>> > In /etc/heketi/heketi.json, following is the only modified portion:<br>
>> > ------------------------------<wbr>------------------------------<wbr>------<br>
>> > "executor": "ssh",<br>
>> ><br>
>> > "_sshexec_comment": "SSH username and private key file information",<br>
>> > "sshexec": {<br>
>> > "keyfile": "/var/lib/heketi/.ssh/id_rsa",<br>
>> > "user": "root",<br>
>> > "port": "22",<br>
>> > "fstab": "/etc/fstab"<br>
>> > },<br>
>> > ------------------------------<wbr>------------------------------<wbr>------<br>
>> ><br>
>> > Status before running gk-deploy:<br>
>> ><br>
>> > [root@workstation deploy]# kubectl get nodes,pods,services,<wbr>deployments<br>
>> > NAME STATUS AGE VERSION<br>
>> > no/node-1.c.kubernetes-174104.<wbr>internal Ready 2d<br>
>> > v1.7.2-rancher1<br>
>> > no/node-2.c.kubernetes-174104.<wbr>internal Ready 2d<br>
>> > v1.7.2-rancher1<br>
>> > no/node-3.c.kubernetes-174104.<wbr>internal Ready 2d<br>
>> > v1.7.2-rancher1<br>
>> ><br>
>> > NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE<br>
>> > svc/kubernetes 10.43.0.1 <none> 443/TCP 2d<br>
>> ><br>
>> ><br>
>> > Now when i run 'gk-deploy -g', in the Rancher console, i see the<br>
>> > following<br>
>> > error:<br>
>> > Readiness probe failed: Failed to get D-Bus connection: Operation not<br>
>> > permitted<br>
>> ><br>
>> > From the attached gk-deploy_log i see that it failed at:<br>
>> > Waiting for GlusterFS pods to start ... pods not found.<br>
>> ><br>
>> > In the kube-templates/glusterfs-<wbr>daemonset.yaml file, i see this for<br>
>> > Readiness probe section:<br>
>> > ------------------------------<wbr>------------------------------<wbr>------<br>
>> > readinessProbe:<br>
>> > timeoutSeconds: 3<br>
>> > initialDelaySeconds: 40<br>
>> > exec:<br>
>> > command:<br>
>> > - "/bin/bash"<br>
>> > - "-c"<br>
>> > - systemctl status glusterd.service<br>
>> > periodSeconds: 25<br>
>> > successThreshold: 1<br>
>> > failureThreshold: 15<br>
>> > ------------------------------<wbr>------------------------------<wbr>------<br>
>> ><br>
>> ><br>
>> > Status after running gk-deploy:<br>
>> ><br>
>> > [root@workstation deploy]# kubectl get nodes,pods,deployments,<wbr>services<br>
>> > NAME STATUS AGE VERSION<br>
>> > no/node-1.c.kubernetes-174104.<wbr>internal Ready 2d<br>
>> > v1.7.2-rancher1<br>
>> > no/node-2.c.kubernetes-174104.<wbr>internal Ready 2d<br>
>> > v1.7.2-rancher1<br>
>> > no/node-3.c.kubernetes-174104.<wbr>internal Ready 2d<br>
>> > v1.7.2-rancher1<br>
>> ><br>
>> > NAME READY STATUS RESTARTS AGE<br>
>> > po/glusterfs-0s440 0/1 Running 0 1m<br>
>> > po/glusterfs-j7dgr 0/1 Running 0 1m<br>
>> > po/glusterfs-p6jl3 0/1 Running 0 1m<br>
>> ><br>
>> > NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE<br>
>> > svc/kubernetes 10.43.0.1 <none> 443/TCP 2d<br>
>> ><br>
>> ><br>
>> > Also, from prerequisite perspective, i was also seeing this mentioned:<br>
>> ><br>
>> > The following kernel modules must be loaded:<br>
>> > * dm_snapshot<br>
>> > * dm_mirror<br>
>> > * dm_thin_pool<br>
>> ><br>
>> > Where exactly is this to be checked? On all Gluster server nodes? How<br>
>> > can i<br>
>> > check whether it's there?<br>
>> ><br>
>> > I have attached topology.json and gk-deploy log for reference.<br>
>> ><br>
>> > Does this issue has anything to do with the host OS (RancherOS) that i<br>
>> > am<br>
>> > using for Gluster nodes? Any idea how i can fix this? Any help will<br>
>> > really<br>
>> > be appreciated.<br>
>> ><br>
>> ><br>
>> > Thanks.<br>
>> ><br>
>> ><br>
>> ><br>
>> > ______________________________<wbr>_________________<br>
>> > heketi-devel mailing list<br>
>> > <a href="mailto:heketi-devel@gluster.org">heketi-devel@gluster.org</a><br>
>> > <a href="http://lists.gluster.org/mailman/listinfo/heketi-devel" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/heketi-devel</a><br>
>> ><br>
><br>
><br>
</div></div></blockquote></div><br></div>