[Bugs] [Bug 1694139] New: Error waiting for job 'heketi-storage-copy-job' to complete on one-node k3s deployment.
bugzilla at redhat.com
bugzilla at redhat.com
Fri Mar 29 15:46:29 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1694139
Bug ID: 1694139
Summary: Error waiting for job 'heketi-storage-copy-job' to
complete on one-node k3s deployment.
Product: GlusterFS
Version: 4.1
Hardware: x86_64
OS: Linux
Status: NEW
Component: glusterd
Assignee: bugs at gluster.org
Reporter: it.sergm at gmail.com
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
Deploying k3s with gluster in single-node deployment. Solution worked with
kubernetes, but not working with k3s(https://github.com/rancher/k3s)
Version-Release number of selected component (if applicable):
gluster-kubernetes 1.2.0(https://github.com/gluster/gluster-kubernetes.git)
k3s - tested with v0.2.0 and v0.3.0-rc4 (https://github.com/rancher/k3s)
Glusterfs package - tested with 3.8.x, 3.12.x and 3.13.2
OS - tested with Ubuntu 16.04.6(4.4.0-143-generic) and Ubuntu
18.04.2(4.15.0-46-generic) with `apt full-upgrade` applied.
Steps to Reproduce:
1. install and configure k3s.
# make sure hostname included in /etc/hosts with relevant ip
git clone --depth 1 https://github.com/rancher/k3s.git
cd k3s; sh install.sh
# Label node:
kubectl label node k3s-gluster node-role.kubernetes.io/master=""
2. pre-configuring gluster.
# install packages needed for gluster
apt -y install thin-provisioning-tools glusterfs-client
# required modules
cat << 'EOF' > /etc/modules-load.d/kubernetes-glusterfs.conf
# this module is required for glusterfs deployment on kubernetes
dm_thin_pool
EOF
## load the module
modprobe dm_thin_pool
# get the gk-deploy code
cd $HOME
mkdir src
cd src
git clone https://github.com/gluster/gluster-kubernetes.git
cd gluster-kubernetes/deploy
# creating topology file. Ip 10.0.0.10 was added in separate deployment as
private one using 'ip addr add dev ens3 10.0.0.10/24'
cat <<EOF > topology.json
{
"clusters": [
{
"nodes": [
{
"node": {
"hostnames": {
"manage": [
"k3s-gluster"
],
"storage": [
"10.0.0.10"
]
},
"zone": 1
},
"devices": [
"/dev/vdb"
]
}
]
}
]
}
EOF
# patching kube-templates/glusterfs-daemonset.yaml regarding to patch
https://github.com/gluster/gluster-kubernetes/issues/539#issuecomment-454668538
3. Deploying gluster:
root at k3s-gluster:~/src/gluster-kubernetes/deploy# ./gk-deploy -n kube-system
--single-node -gvy topology.json
Using Kubernetes CLI.
Checking status of namespace matching 'kube-system':
kube-system Active 4m36s
Using namespace "kube-system".
Checking for pre-existing resources...
GlusterFS pods ...
Checking status of pods matching '--selector=glusterfs=pod':
Timed out waiting for pods matching '--selector=glusterfs=pod'.
not found.
deploy-heketi pod ...
Checking status of pods matching '--selector=deploy-heketi=pod':
Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
not found.
heketi pod ...
Checking status of pods matching '--selector=heketi=pod':
Timed out waiting for pods matching '--selector=heketi=pod'.
not found.
gluster-s3 pod ...
Checking status of pods matching '--selector=glusterfs=s3-pod':
Timed out waiting for pods matching '--selector=glusterfs=s3-pod'.
not found.
Creating initial resources ... /usr/local/bin/kubectl -n kube-system create -f
/root/src/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml
2>&1
serviceaccount/heketi-service-account created
/usr/local/bin/kubectl -n kube-system create clusterrolebinding heketi-sa-view
--clusterrole=edit --serviceaccount=kube-system:heketi-service-account 2>&1
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view created
/usr/local/bin/kubectl -n kube-system label --overwrite clusterrolebinding
heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view labeled
OK
Marking 'k3s-gluster' as a GlusterFS node.
/usr/local/bin/kubectl -n kube-system label nodes k3s-gluster
storagenode=glusterfs --overwrite 2>&1
node/k3s-gluster labeled
Deploying GlusterFS pods.
sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g'
/root/src/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml |
/usr/local/bin/kubectl -n kube-system create -f - 2>&1
daemonset.extensions/glusterfs created
Waiting for GlusterFS pods to start ...
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-xvkrp 1/1 Running 0 70s
OK
/usr/local/bin/kubectl -n kube-system create secret generic
heketi-config-secret --from-file=private_key=/dev/null
--from-file=./heketi.json --from-file=topology.json=topology.json
secret/heketi-config-secret created
/usr/local/bin/kubectl -n kube-system label --overwrite secret
heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret
secret/heketi-config-secret labeled
sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e
's#\${HEKETI_FSTAB}#/var/lib/heketi/fstab#' -e 's/\${HEKETI_ADMIN_KEY}' -e
's/\${HEKETI_USER_KEY}'
/root/src/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml
| /usr/local/bin/kubectl -n kube-system create -f - 2>&1
service/deploy-heketi created
deployment.extensions/deploy-heketi created
Waiting for deploy-heketi pod to start ...
Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-5f6c465bb8-zl959 1/1 Running 0 19s
OK
Determining heketi service URL ... OK
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 --
heketi-cli -s http://localhost:8080 --user admin --secret '' topology load
--json=/etc/heketi/topology.json 2>&1
Creating cluster ... ID: 949e5d5063a1c1589940b7ff4705dae8
Allowing file volumes on cluster.
Allowing block volumes on cluster.
Creating node k3s-gluster ... ID: 6f8e3cbc0cbf6d668d718cd9bd6022f5
Adding device /dev/vdb ... OK
heketi topology loaded.
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 --
heketi-cli -s http://localhost:8080 --user admin --secret ''
setup-openshift-heketi-storage --help --durability=none >/dev/null 2>&1
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 --
heketi-cli -s http://localhost:8080 --user admin --secret ''
setup-openshift-heketi-storage --listfile=/tmp/heketi-storage.json
--durability=none 2>&1
Saving /tmp/heketi-storage.json
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 --
cat /tmp/heketi-storage.json | /usr/local/bin/kubectl -n kube-system create -f
- 2>&1
secret/heketi-storage-secret created
endpoints/heketi-storage-endpoints created
service/heketi-storage-endpoints created
job.batch/heketi-storage-copy-job created
Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':
heketi-storage-copy-job-xft9f 0/1 ContainerCreating 0 5m16s
Timed out waiting for pods matching
'--selector=job-name=heketi-storage-copy-job'.
Error waiting for job 'heketi-storage-copy-job' to complete.
Actual results:
root at k3s-gluster:~/src/gluster-kubernetes/deploy# kubectl get pods -n
kube-system
NAME READY STATUS RESTARTS AGE
coredns-7748f7f6df-cchx7 1/1 Running 0 177m
deploy-heketi-5f6c465bb8-5f27p 1/1 Running 0 173m
glusterfs-ntmq7 1/1 Running 0 174m
heketi-storage-copy-job-qzpr7 0/1 ContainerCreating 0 170m
svclb-traefik-957cdf677-c4j76 2/2 Running 1 177m
traefik-7b6bd6cbf6-rnrxj 1/1 Running 0 177m
root at k3s-gluster:~/src/gluster-kubernetes/deploy# kubectl -n kube-system
describe po/heketi-storage-copy-job-qzpr7
Name: heketi-storage-copy-job-qzpr7
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: k3s-gluster/104.36.17.63
Start Time: Fri, 29 Mar 2019 08:54:08 +0000
Labels: controller-uid=36e114ae-5200-11e9-a826-227e2ba50104
job-name=heketi-storage-copy-job
Annotations: <none>
Status: Pending
IP:
Controlled By: Job/heketi-storage-copy-job
Containers:
heketi:
Container ID:
Image: heketi/heketi:dev
Image ID:
Port: <none>
Host Port: <none>
Command:
cp
/db/heketi.db
/heketi
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/db from heketi-storage-secret (rw)
/heketi from heketi-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-98jvk
(ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
heketi-storage:
Type: Glusterfs (a Glusterfs mount on the host that shares a
pod's lifetime)
EndpointsName: heketi-storage-endpoints
Path: heketidbstorage
ReadOnly: false
heketi-storage-secret:
Type: Secret (a volume populated by a Secret)
SecretName: heketi-storage-secret
Optional: false
default-token-98jvk:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-98jvk
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 3m56s (x74 over 169m) kubelet, k3s-gluster Unable to
mount volumes for pod
"heketi-storage-copy-job-qzpr7_kube-system(36e1b013-5200-11e9-a826-227e2ba50104)":
timeout expired waiting for volumes to attach or mount for pod
"kube-system"/"heketi-storage-copy-job-qzpr7". list of unmounted
volumes=[heketi-storage]. list of unattached volumes=[heketi-storage
heketi-storage-secret default-token-98jvk]
Expected results:
all pods running, gk-deploy works with no errors
Additional info:
Same Gluster procedure works with single-node kubernetes, but won't work with
k3s.
Firewall is default and only modified with k3s iptables rules.
I've been trying different configurations and they don't work:
private IP in the topology(also used main public ip)
deploying with a clean drive
mounting the volume from outside
updating the gluster client to v3.12.x on ubuntu16 and 3.13.2 on ubuntu18
Gluster logs, volumes:
root at k3s-gluster:~/src/gluster-kubernetes/deploy# kubectl -n kube-system exec
-it glusterfs-ntmq7 /bin/bash
[root at k3s-gluster /]# cat /var/log/glusterfs/glusterd.log
[2019-03-29 08:50:44.968074] I [MSGID: 100030] [glusterfsd.c:2741:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 4.1.7 (args:
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2019-03-29 08:50:44.977762] I [MSGID: 106478] [glusterd.c:1423:init]
0-management: Maximum allowed open file descriptors set to 65536
[2019-03-29 08:50:44.977790] I [MSGID: 106479] [glusterd.c:1481:init]
0-management: Using /var/lib/glusterd as working directory
[2019-03-29 08:50:44.977797] I [MSGID: 106479] [glusterd.c:1486:init]
0-management: Using /var/run/gluster as pid file working directory
[2019-03-29 08:50:45.002831] W [MSGID: 103071]
[rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel
creation failed [No such device]
[2019-03-29 08:50:45.002862] W [MSGID: 103055] [rdma.c:4938:init]
0-rdma.management: Failed to initialize IB Device
[2019-03-29 08:50:45.002873] W [rpc-transport.c:351:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2019-03-29 08:50:45.002957] W [rpcsvc.c:1781:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2019-03-29 08:50:45.002968] E [MSGID: 106244] [glusterd.c:1764:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2019-03-29 08:50:46.040712] E [MSGID: 101032]
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-03-29 08:50:46.040765] E [MSGID: 101032]
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-03-29 08:50:46.040768] I [MSGID: 106514]
[glusterd-store.c:2262:glusterd_restore_op_version] 0-management: Detected new
install. Setting op-version to maximum : 40100
[2019-03-29 08:50:46.044266] I [MSGID: 106194]
[glusterd-store.c:3850:glusterd_store_retrieve_missed_snaps_list] 0-management:
No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
1: volume management
2: type mgmt/glusterd
3: option rpc-auth.auth-glusterfs on
4: option rpc-auth.auth-unix on
5: option rpc-auth.auth-null on
6: option rpc-auth-allow-insecure on
7: option transport.listen-backlog 10
8: option event-threads 1
9: option ping-timeout 0
10: option transport.socket.read-fail-log off
11: option transport.socket.keepalive-interval 2
12: option transport.socket.keepalive-time 10
13: option transport-type rdma
14: option working-directory /var/lib/glusterd
15: end-volume
16:
+------------------------------------------------------------------------------+
[2019-03-29 08:50:46.044640] I [MSGID: 101190]
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2019-03-29 08:54:07.698610] E [MSGID: 101032]
[store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-03-29 08:54:07.698686] I [MSGID: 106477]
[glusterd.c:190:glusterd_uuid_generate_save] 0-management: generated UUID:
9dc908c2-0e7d-4b40-a951-095b78dbaeeb
[2019-03-29 08:54:07.706214] W [MSGID: 101095]
[xlator.c:181:xlator_volopt_dynload] 0-xlator:
/usr/lib64/glusterfs/4.1.7/xlator/nfs/server.so: cannot open shared object
file: No such file or directory
[2019-03-29 08:54:07.730620] I [run.c:241:runner_log]
(-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a)
[0x7f7f4e7f1c9a]
-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765)
[0x7f7f4e7f1765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7f5395d0f5]
) 0-management: Ran script:
/var/lib/glusterd/hooks/1/create/post/S10selinux-label-brick.sh
--volname=heketidbstorage
[2019-03-29 08:54:07.863432] I [glusterd-utils.c:6090:glusterd_brick_start]
0-management: starting a fresh brick process for brick
/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7/brick
[2019-03-29 08:54:07.989260] I [MSGID: 106142]
[glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick
/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7/brick
on port 49152
[2019-03-29 08:54:07.998472] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2019-03-29 08:54:08.007817] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-snapd: setting frame-timeout to 600
[2019-03-29 08:54:08.008060] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-gfproxyd: setting frame-timeout to 600
[2019-03-29 08:54:08.008256] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-nfs: setting frame-timeout to 600
[2019-03-29 08:54:08.008335] I [MSGID: 106131]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2019-03-29 08:54:08.008360] I [MSGID: 106568]
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is
stopped
[2019-03-29 08:54:08.008376] I [MSGID: 106599]
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so
xlator is not installed
[2019-03-29 08:54:08.008402] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-glustershd: setting frame-timeout to 600
[2019-03-29 08:54:08.008493] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-quotad: setting frame-timeout to 600
[2019-03-29 08:54:08.008656] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-bitd: setting frame-timeout to 600
[2019-03-29 08:54:08.008772] I [MSGID: 106131]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2019-03-29 08:54:08.008785] I [MSGID: 106568]
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is
stopped
[2019-03-29 08:54:08.008808] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
0-scrub: setting frame-timeout to 600
[2019-03-29 08:54:08.008907] I [MSGID: 106131]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already
stopped
[2019-03-29 08:54:08.008917] I [MSGID: 106568]
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is
stopped
[2019-03-29 08:54:08.015319] I [run.c:241:runner_log]
(-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a)
[0x7f7f4e7f1c9a]
-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765)
[0x7f7f4e7f1765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7f5395d0f5]
) 0-management: Ran script:
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=heketidbstorage
--first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-03-29 08:54:08.025189] E [run.c:241:runner_log]
(-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a)
[0x7f7f4e7f1c9a]
-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe26c3)
[0x7f7f4e7f16c3] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7f5395d0f5]
) 0-management: Failed to execute script:
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
--volname=heketidbstorage --first=yes --version=1 --volume-op=start
--gd-workdir=/var/lib/glusterd
[root at k3s-gluster /]# lsblk
NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop1
7:1 0 87.9M 1 loop
vdb
252:16 0 100G 0 disk
├─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4_tdata
253:1 0 2G 0 lvm
│
└─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4-tpool
253:2 0 2G 0 lvm
│
├─vg_fef96eab984d116ab3815e7479781110-brick_65d5aa6369e265d641f3557e6c9736b7
253:4 0 2G 0 lvm
/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7
│ └─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4
253:3 0 2G 0 lvm
└─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4_tmeta
253:0 0 12M 0 lvm
└─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4-tpool
253:2 0 2G 0 lvm
├─vg_fef96eab984d116ab3815e7479781110-brick_65d5aa6369e265d641f3557e6c9736b7
253:4 0 2G 0 lvm
/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7
└─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4
253:3 0 2G 0 lvm
loop2
7:2 0 91M 1 loop
loop0
7:0 0 89.3M 1 loop
vda
252:0 0 10G 0 disk
├─vda2
252:2 0 10G 0 part /var/lib/misc/glusterfsd
└─vda1
252:1 0 1M 0 part
[root at k3s-gluster /]# gluster volume info
Volume Name: heketidbstorage
Type: Distribute
Volume ID: 32608bdb-a4a3-494e-9c6e-68d8f780f12c
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1:
10.0.0.10:/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
[root at k3s-gluster /]# mount -t glusterfs 10.0.0.10:/heketidbstorage
/mnt/glustertest
WARNING: getfattr not found, certain checks will be skipped..
[root at k3s-gluster /]# mount | grep 10.0.0.10
10.0.0.10:/heketidbstorage on /mnt/glustertest type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list