[Bugs] [Bug 1694139] New: Error waiting for job 'heketi-storage-copy-job' to complete on one-node k3s deployment.

Fri Mar 29 15:46:29 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1694139

            Bug ID: 1694139
           Summary: Error waiting for job 'heketi-storage-copy-job' to
                    complete on one-node k3s deployment.
           Product: GlusterFS
           Version: 4.1
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: glusterd
          Assignee: bugs at gluster.org
          Reporter: it.sergm at gmail.com
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community

Description of problem:
Deploying k3s with gluster in single-node deployment. Solution worked with
kubernetes, but not working with k3s(https://github.com/rancher/k3s)

Version-Release number of selected component (if applicable):
gluster-kubernetes 1.2.0(https://github.com/gluster/gluster-kubernetes.git)
k3s - tested with v0.2.0 and v0.3.0-rc4 (https://github.com/rancher/k3s)
Glusterfs package - tested with 3.8.x, 3.12.x and 3.13.2
OS - tested with Ubuntu 16.04.6(4.4.0-143-generic) and Ubuntu
18.04.2(4.15.0-46-generic) with `apt full-upgrade` applied.

Steps to Reproduce:
1. install and configure k3s.
# make sure hostname included in /etc/hosts with relevant ip
git clone --depth 1 https://github.com/rancher/k3s.git
cd k3s; sh install.sh 

# Label node:
kubectl label node k3s-gluster node-role.kubernetes.io/master=""

2. pre-configuring gluster.
# install packages needed for gluster
apt -y install thin-provisioning-tools glusterfs-client

# required modules
cat << 'EOF' > /etc/modules-load.d/kubernetes-glusterfs.conf
# this module is required for glusterfs deployment on kubernetes
dm_thin_pool
EOF

## load the module
modprobe dm_thin_pool

# get the gk-deploy code
cd $HOME
mkdir src
cd src
git clone https://github.com/gluster/gluster-kubernetes.git
cd gluster-kubernetes/deploy
# creating topology file. Ip 10.0.0.10 was added in separate deployment as
private one using 'ip addr add dev ens3 10.0.0.10/24'
cat <<EOF > topology.json
{
  "clusters": [
        {
          "nodes": [
                {
                  "node": {
                        "hostnames": {
                          "manage": [
                                "k3s-gluster"
                          ],
                          "storage": [
                                "10.0.0.10"
                          ]
                        },
                        "zone": 1
                  },
                  "devices": [
                        "/dev/vdb"
                  ]
                }
          ]
        }
  ]
}
EOF

# patching kube-templates/glusterfs-daemonset.yaml regarding to patch
https://github.com/gluster/gluster-kubernetes/issues/539#issuecomment-454668538

3. Deploying gluster:
root at k3s-gluster:~/src/gluster-kubernetes/deploy# ./gk-deploy -n kube-system
--single-node -gvy topology.json

Using Kubernetes CLI.

Checking status of namespace matching 'kube-system':
kube-system   Active   4m36s
Using namespace "kube-system".
Checking for pre-existing resources...
  GlusterFS pods ... 
Checking status of pods matching '--selector=glusterfs=pod':

Timed out waiting for pods matching '--selector=glusterfs=pod'.
not found.
  deploy-heketi pod ... 
Checking status of pods matching '--selector=deploy-heketi=pod':

Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
not found.
  heketi pod ... 
Checking status of pods matching '--selector=heketi=pod':

Timed out waiting for pods matching '--selector=heketi=pod'.
not found.
  gluster-s3 pod ... 
Checking status of pods matching '--selector=glusterfs=s3-pod':

Timed out waiting for pods matching '--selector=glusterfs=s3-pod'.
not found.
Creating initial resources ... /usr/local/bin/kubectl -n kube-system create -f
/root/src/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml
2>&1
serviceaccount/heketi-service-account created
/usr/local/bin/kubectl -n kube-system create clusterrolebinding heketi-sa-view
--clusterrole=edit --serviceaccount=kube-system:heketi-service-account 2>&1
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view created
/usr/local/bin/kubectl -n kube-system label --overwrite clusterrolebinding
heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view labeled
OK
Marking 'k3s-gluster' as a GlusterFS node.
/usr/local/bin/kubectl -n kube-system label nodes k3s-gluster
storagenode=glusterfs --overwrite 2>&1
node/k3s-gluster labeled
Deploying GlusterFS pods.
sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g'
/root/src/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml |
/usr/local/bin/kubectl -n kube-system create -f - 2>&1
daemonset.extensions/glusterfs created
Waiting for GlusterFS pods to start ... 
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-xvkrp   1/1   Running   0     70s
OK
/usr/local/bin/kubectl -n kube-system create secret generic
heketi-config-secret --from-file=private_key=/dev/null
--from-file=./heketi.json --from-file=topology.json=topology.json
secret/heketi-config-secret created
/usr/local/bin/kubectl -n kube-system label --overwrite secret
heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret
secret/heketi-config-secret labeled
sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e
's#\${HEKETI_FSTAB}#/var/lib/heketi/fstab#' -e 's/\${HEKETI_ADMIN_KEY}' -e
's/\${HEKETI_USER_KEY}'
/root/src/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml
| /usr/local/bin/kubectl -n kube-system create -f - 2>&1
service/deploy-heketi created
deployment.extensions/deploy-heketi created
Waiting for deploy-heketi pod to start ... 
Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-5f6c465bb8-zl959   1/1   Running   0     19s
OK
Determining heketi service URL ... OK
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 --
heketi-cli -s http://localhost:8080 --user admin --secret '' topology load
--json=/etc/heketi/topology.json 2>&1
Creating cluster ... ID: 949e5d5063a1c1589940b7ff4705dae8
Allowing file volumes on cluster.
Allowing block volumes on cluster.
Creating node k3s-gluster ... ID: 6f8e3cbc0cbf6d668d718cd9bd6022f5
Adding device /dev/vdb ... OK
heketi topology loaded.
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 --
heketi-cli -s http://localhost:8080 --user admin --secret ''
setup-openshift-heketi-storage --help --durability=none >/dev/null 2>&1
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 --
heketi-cli -s http://localhost:8080 --user admin --secret ''
setup-openshift-heketi-storage --listfile=/tmp/heketi-storage.json
--durability=none 2>&1
Saving /tmp/heketi-storage.json
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 --
cat /tmp/heketi-storage.json | /usr/local/bin/kubectl -n kube-system create -f
- 2>&1
secret/heketi-storage-secret created
endpoints/heketi-storage-endpoints created
service/heketi-storage-endpoints created
job.batch/heketi-storage-copy-job created

Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':
heketi-storage-copy-job-xft9f   0/1   ContainerCreating   0     5m16s
Timed out waiting for pods matching
'--selector=job-name=heketi-storage-copy-job'.
Error waiting for job 'heketi-storage-copy-job' to complete.

Actual results:
root at k3s-gluster:~/src/gluster-kubernetes/deploy# kubectl get pods -n
kube-system
NAME                             READY   STATUS              RESTARTS   AGE
coredns-7748f7f6df-cchx7         1/1     Running             0          177m
deploy-heketi-5f6c465bb8-5f27p   1/1     Running             0          173m
glusterfs-ntmq7                  1/1     Running             0          174m
heketi-storage-copy-job-qzpr7    0/1     ContainerCreating   0          170m
svclb-traefik-957cdf677-c4j76    2/2     Running             1          177m
traefik-7b6bd6cbf6-rnrxj         1/1     Running             0          177m

root at k3s-gluster:~/src/gluster-kubernetes/deploy# kubectl -n kube-system
describe po/heketi-storage-copy-job-qzpr7
Name:               heketi-storage-copy-job-qzpr7
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               k3s-gluster/104.36.17.63
Start Time:         Fri, 29 Mar 2019 08:54:08 +0000
Labels:             controller-uid=36e114ae-5200-11e9-a826-227e2ba50104
                    job-name=heketi-storage-copy-job
Annotations:        <none>
Status:             Pending
IP:                 
Controlled By:      Job/heketi-storage-copy-job
Containers:
  heketi:
    Container ID:  
    Image:         heketi/heketi:dev
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      /db/heketi.db
      /heketi
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /db from heketi-storage-secret (rw)
      /heketi from heketi-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-98jvk
(ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  heketi-storage:
    Type:           Glusterfs (a Glusterfs mount on the host that shares a
pod's lifetime)
    EndpointsName:  heketi-storage-endpoints
    Path:           heketidbstorage
    ReadOnly:       false
  heketi-storage-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-storage-secret
    Optional:    false
  default-token-98jvk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-98jvk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                    From                  Message
  ----     ------       ----                   ----                  -------
  Warning  FailedMount  3m56s (x74 over 169m)  kubelet, k3s-gluster  Unable to
mount volumes for pod
"heketi-storage-copy-job-qzpr7_kube-system(36e1b013-5200-11e9-a826-227e2ba50104)":
timeout expired waiting for volumes to attach or mount for pod
"kube-system"/"heketi-storage-copy-job-qzpr7". list of unmounted
volumes=[heketi-storage]. list of unattached volumes=[heketi-storage
heketi-storage-secret default-token-98jvk]

Expected results:
all pods running, gk-deploy works with no errors

Additional info:
Same Gluster procedure works with single-node kubernetes, but won't work with
k3s.
Firewall is default and only modified with k3s iptables rules.
I've been trying different configurations and they don't work:
private IP in the topology(also used main public ip)
deploying with a clean drive
mounting the volume from outside
updating the gluster client to v3.12.x on ubuntu16 and 3.13.2 on ubuntu18

Gluster logs, volumes:
root at k3s-gluster:~/src/gluster-kubernetes/deploy# kubectl -n kube-system exec
-it glusterfs-ntmq7 /bin/bash
[root at k3s-gluster /]# cat /var/log/glusterfs/glusterd.log 
[2019-03-29 08:50:44.968074] I [MSGID: 100030] [glusterfsd.c:2741:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 4.1.7 (args:
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2019-03-29 08:50:44.977762] I [MSGID: 106478] [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors set to 65536
[2019-03-29 08:50:44.977790] I [MSGID: 106479] [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working directory
[2019-03-29 08:50:44.977797] I [MSGID: 106479] [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file working directory
[2019-03-29 08:50:45.002831] W [MSGID: 103071]
[rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel
creation failed [No such device]
[2019-03-29 08:50:45.002862] W [MSGID: 103055] [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
[2019-03-29 08:50:45.002873] W [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2019-03-29 08:50:45.002957] W [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2019-03-29 08:50:45.002968] E [MSGID: 106244] [glusterd.c:1764:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2019-03-29 08:50:46.040712] E [MSGID: 101032]
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-03-29 08:50:46.040765] E [MSGID: 101032]
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-03-29 08:50:46.040768] I [MSGID: 106514]
[glusterd-store.c:2262:glusterd_restore_op_version] 0-management: Detected new
install. Setting op-version to maximum : 40100
[2019-03-29 08:50:46.044266] I [MSGID: 106194]
[glusterd-store.c:3850:glusterd_store_retrieve_missed_snaps_list] 0-management:
No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 10
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:  
+------------------------------------------------------------------------------+
[2019-03-29 08:50:46.044640] I [MSGID: 101190]
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2019-03-29 08:54:07.698610] E [MSGID: 101032]
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-03-29 08:54:07.698686] I [MSGID: 106477]
[glusterd.c:190:glusterd_uuid_generate_save] 0-management: generated UUID:
9dc908c2-0e7d-4b40-a951-095b78dbaeeb
[2019-03-29 08:54:07.706214] W [MSGID: 101095]
[xlator.c:181:xlator_volopt_dynload] 0-xlator:
/usr/lib64/glusterfs/4.1.7/xlator/nfs/server.so: cannot open shared object
file: No such file or directory
[2019-03-29 08:54:07.730620] I [run.c:241:runner_log]
(-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a)
[0x7f7f4e7f1c9a]
-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765)
[0x7f7f4e7f1765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7f5395d0f5]
) 0-management: Ran script:
/var/lib/glusterd/hooks/1/create/post/S10selinux-label-brick.sh
--volname=heketidbstorage
[2019-03-29 08:54:07.863432] I [glusterd-utils.c:6090:glusterd_brick_start]
0-management: starting a fresh brick process for brick
/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7/brick
[2019-03-29 08:54:07.989260] I [MSGID: 106142]
[glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick
/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7/brick
on port 49152
[2019-03-29 08:54:07.998472] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2019-03-29 08:54:08.007817] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-snapd: setting frame-timeout to 600
[2019-03-29 08:54:08.008060] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-gfproxyd: setting frame-timeout to 600
[2019-03-29 08:54:08.008256] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-nfs: setting frame-timeout to 600
[2019-03-29 08:54:08.008335] I [MSGID: 106131]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2019-03-29 08:54:08.008360] I [MSGID: 106568]
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is
stopped
[2019-03-29 08:54:08.008376] I [MSGID: 106599]
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so
xlator is not installed
[2019-03-29 08:54:08.008402] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-glustershd: setting frame-timeout to 600
[2019-03-29 08:54:08.008493] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-quotad: setting frame-timeout to 600
[2019-03-29 08:54:08.008656] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-bitd: setting frame-timeout to 600
[2019-03-29 08:54:08.008772] I [MSGID: 106131]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2019-03-29 08:54:08.008785] I [MSGID: 106568]
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is
stopped
[2019-03-29 08:54:08.008808] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-scrub: setting frame-timeout to 600
[2019-03-29 08:54:08.008907] I [MSGID: 106131]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already
stopped
[2019-03-29 08:54:08.008917] I [MSGID: 106568]
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is
stopped
[2019-03-29 08:54:08.015319] I [run.c:241:runner_log]
(-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a)
[0x7f7f4e7f1c9a]
-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765)
[0x7f7f4e7f1765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7f5395d0f5]
) 0-management: Ran script:
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=heketidbstorage
--first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-03-29 08:54:08.025189] E [run.c:241:runner_log]
(-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a)
[0x7f7f4e7f1c9a]
-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe26c3)
[0x7f7f4e7f16c3] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7f5395d0f5]
) 0-management: Failed to execute script:
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
--volname=heketidbstorage --first=yes --version=1 --volume-op=start
--gd-workdir=/var/lib/glusterd

[root at k3s-gluster /]# lsblk 
NAME                                                                           
  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop1                                                                          
    7:1    0 87.9M  1 loop 
vdb                                                                            
  252:16   0  100G  0 disk 
├─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4_tdata
  253:1    0    2G  0 lvm  
│
└─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4-tpool
253:2    0    2G  0 lvm  
│  
├─vg_fef96eab984d116ab3815e7479781110-brick_65d5aa6369e265d641f3557e6c9736b7 
253:4    0    2G  0 lvm 
/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7
│   └─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4  
  253:3    0    2G  0 lvm  
└─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4_tmeta
  253:0    0   12M  0 lvm  

└─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4-tpool
253:2    0    2G  0 lvm  

├─vg_fef96eab984d116ab3815e7479781110-brick_65d5aa6369e265d641f3557e6c9736b7 
253:4    0    2G  0 lvm 
/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7
    └─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4  
  253:3    0    2G  0 lvm  
loop2                                                                          
    7:2    0   91M  1 loop 
loop0                                                                          
    7:0    0 89.3M  1 loop 
vda                                                                            
  252:0    0   10G  0 disk 
├─vda2                                                                         
  252:2    0   10G  0 part /var/lib/misc/glusterfsd
└─vda1                                                                         
  252:1    0    1M  0 part 

[root at k3s-gluster /]# gluster volume info
Volume Name: heketidbstorage
Type: Distribute
Volume ID: 32608bdb-a4a3-494e-9c6e-68d8f780f12c
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1:
10.0.0.10:/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

[root at k3s-gluster /]# mount -t glusterfs 10.0.0.10:/heketidbstorage
/mnt/glustertest
WARNING: getfattr not found, certain checks will be skipped..

[root at k3s-gluster /]# mount | grep 10.0.0.10
10.0.0.10:/heketidbstorage on /mnt/glustertest type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.