[Bugs] [Bug 1709959] Gluster causing Kubernetes containers to enter crash loop with 'mkdir ... file exists' error message
bugzilla at redhat.com
bugzilla at redhat.com
Tue May 14 16:24:22 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1709959
--- Comment #2 from Jeff Bischoff <jeff.bischoff at turbonomic.com> ---
This is the Kubernetes version from our latest failing environments:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13",
GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2",
GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5",
Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13",
GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2",
GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5",
Compiler:"gc", Platform:"linux/amd64"}
Here's how the heketi pod looks:
$ kubectl describe pod heketi-7495cdc5fd-xqmxr
Error from server (NotFound): pods "heketi-7495cdc5fd-xqmxr" not found
[turbo at node1 ~]$ kubectl describe pod heketi-7495cdc5fd-xqmxr -n default
Name: heketi-7495cdc5fd-xqmxr
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: node1/10.10.168.25
Start Time: Mon, 06 May 2019 02:11:42 +0000
Labels: glusterfs=heketi-pod
heketi=pod
pod-template-hash=7495cdc5fd
Annotations: <none>
Status: Running
IP: 10.233.90.85
Controlled By: ReplicaSet/heketi-7495cdc5fd
Containers:
heketi:
Container ID:
docker://fed61190bf01d149027f187e49a8428e0654fc347de9a9164665f40247c543b3
Image: heketi/heketi:dev
Image ID:
docker-pullable://heketi/heketi@sha256:bcbf709fd084793e4ff0379f08ca44f71154c270d3a74df2bd146472e2d28402
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Message: error while creating mount source path
'/var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db':
mkdir
/var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db:
file exists
Exit Code: 128
Started: Tue, 14 May 2019 14:34:55 +0000
Finished: Tue, 14 May 2019 14:34:55 +0000
Ready: False
Restart Count: 1735
Liveness: http-get http://:8080/hello delay=30s timeout=3s period=10s
#success=1 #failure=3
Readiness: http-get http://:8080/hello delay=3s timeout=3s period=10s
#success=1 #failure=3
Environment:
HEKETI_USER_KEY:
HEKETI_ADMIN_KEY:
HEKETI_EXECUTOR: kubernetes
HEKETI_FSTAB: /var/lib/heketi/fstab
HEKETI_SNAPSHOT_LIMIT: 14
HEKETI_KUBE_GLUSTER_DAEMONSET: y
HEKETI_IGNORE_STALE_OPERATIONS: true
Mounts:
/etc/heketi from config (rw)
/var/lib/heketi from db (rw)
/var/run/secrets/kubernetes.io/serviceaccount from
heketi-service-account-token-ntfx2 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
db:
Type: Glusterfs (a Glusterfs mount on the host that shares a
pod's lifetime)
EndpointsName: heketi-storage-endpoints
Path: heketidbstorage
ReadOnly: false
config:
Type: Secret (a volume populated by a Secret)
SecretName: heketi-config-secret
Optional: false
heketi-service-account-token-ntfx2:
Type: Secret (a volume populated by a Secret)
SecretName: heketi-service-account-token-ntfx2
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 3m36s (x40124 over 6d) kubelet, node1 Back-off restarting
failed container
I'm not at all familiar with gluster brick logs, but looking at those it
appears that some health checks failed, and they were shut down?
```
[2019-05-08 13:48:33.642896] W [MSGID: 113075]
[posix-helpers.c:1895:posix_fs_health_check]
0-vol_a720850474f6ce7ae6c57dcc60284b1f-posix: aio_write() on
/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_0343
10c050aa134b254316068472b4cc/brick/.glusterfs/health_check returned [Resource
temporarily unavailable]
[2019-05-08 13:48:33.748515] M [MSGID: 113075]
[posix-helpers.c:1962:posix_health_check_thread_proc]
0-vol_a720850474f6ce7ae6c57dcc60284b1f-posix: health-check failed, going down
[2019-05-08 13:48:33.999892] M [MSGID: 113075]
[posix-helpers.c:1981:posix_health_check_thread_proc]
0-vol_a720850474f6ce7ae6c57dcc60284b1f-posix: still alive! -> SIGTERM
[2019-05-08 13:49:04.598861] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f2a27df4dd5]
-->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x562568920d65]
-->/usr/sbin/glusterfsd(cleanup_an
d_exit+0x6b) [0x562568920b8b] ) 0-: received signum (15), shutting down
```
...and...
```
[2019-05-06 03:34:39.698647] I [MSGID: 115036] [server.c:483:server_rpc_notify]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: disconnecting connection from
node1-21644-2019/05/06-02:17:50:364351-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-06 03:34:39.698956] I [MSGID: 101055] [client_t.c:444:gf_client_unref]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: Shutting down connection
node1-21644-2019/05/06-02:17:50:364351-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-06 03:34:54.929155] I [addr.c:55:compare_addr_and_update]
0-/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_1adb1f9dad96381614efc30fe22943a7/brick:
allowed = "*", received addr = "10.10.168.25"
[2019-05-06 03:34:54.929223] I [login.c:111:gf_auth] 0-auth/login: allowed user
names: 57cda2e6-f071-4ec4-b1a5-04f43f91a204
[2019-05-06 03:34:54.929253] I [MSGID: 115029]
[server-handshake.c:495:server_setvolume]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: accepted client from
node1-23801-2019/05/06-03:34:54:882971-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
(version: 3.12.2)
[2019-05-07 11:50:30.502074] I [MSGID: 115036] [server.c:483:server_rpc_notify]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: disconnecting connection from
node1-23801-2019/05/06-03:34:54:882971-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-07 11:50:30.524408] I [MSGID: 101055] [client_t.c:444:gf_client_unref]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: Shutting down connection
node1-23801-2019/05/06-03:34:54:882971-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-07 11:54:45.456189] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f684d7ccdd5]
-->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556c12dead65]
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556c12deab8b] ) 0-: received
signum (15), shutting down
Full gluster logs here:
[glusterfs.zip](https://github.com/heketi/heketi/files/3178441/glusterfs.zip)
I tried to get the heketi container logs, but it appears they don't exist:
$ kubectl logs -n default heketi-7495cdc5fd-xqmxr -p
failed to open log file
"/var/log/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/heketi/1741.log": open
/var/log/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/heketi/1741.log: no such
file or directory
Gluster seems to indicate that all of my bricks are offline:
[root at node1 /]# gluster
gluster> volume list
heketidbstorage
vol_050f6767658bceaed3e4c58693f3220e
vol_0f8c60645f1014a72b9999036d6244e2
vol_27ef5ea360e90f459d56082bc2b7be9f
vol_59090c2fd20479d553a5baa153d3fcbd
vol_673aef3de9147eaede26b7169ebf5f6e
vol_6848649bb5d29d60985d4d59380caafe
vol_744c23296132470b8639599b837ae671
vol_76c6f946e64d2150f99503953127c647
vol_84337b6825c0eb3d7a0e6008b65dd757
vol_9e6cad52d8a8e2e7f8febe2709ef253a
vol_a720850474f6ce7ae6c57dcc60284b1f
vol_c98a28cd587883dc2882c00695b02d52
vol_ced2ac693a19d4ae53af897eaf13bd86
vol_dcece16823bead8503333ef11c022775
vol_e6cbcf7bcb912d6c9725f3390f96b4b3
vol_eaa27e3100f78bff42ff337f163fee0f
vol_ff040fc48f8bd16727423b59ac7244c6
gluster> volume status
Status of volume: heketidbstorage
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_a1
6f9f0374fe5db948a60a017a3f5e60/brick N/A N/A N N/A
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_050f6767658bceaed3e4c58693f3220e
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_a6
d6af28e7525bbe3563948f4f9455bd/brick N/A N/A N N/A
Task Status of Volume vol_050f6767658bceaed3e4c58693f3220e
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_0f8c60645f1014a72b9999036d6244e2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_ed
81730bd5d36a151cf5163f379474b4/brick N/A N/A N N/A
Task Status of Volume vol_0f8c60645f1014a72b9999036d6244e2
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_27ef5ea360e90f459d56082bc2b7be9f
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_76
14e5014a0e402630a0e1fd776acf0a/brick N/A N/A N N/A
Task Status of Volume vol_27ef5ea360e90f459d56082bc2b7be9f
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_59090c2fd20479d553a5baa153d3fcbd
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_93
e6fdd290a8e963d927de4a1115d17e/brick N/A N/A N N/A
Task Status of Volume vol_59090c2fd20479d553a5baa153d3fcbd
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_673aef3de9147eaede26b7169ebf5f6e
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_0e
d4f7f941de388cda678fe273e9ceb4/brick N/A N/A N N/A
Task Status of Volume vol_673aef3de9147eaede26b7169ebf5f6e
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_6848649bb5d29d60985d4d59380caafe
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_5f
8b153d183d154b789425f5f5c8f912/brick N/A N/A N N/A
Task Status of Volume vol_6848649bb5d29d60985d4d59380caafe
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_744c23296132470b8639599b837ae671
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_c1
1ac2780871f7d759a3da1c27e01941/brick N/A N/A N N/A
Task Status of Volume vol_744c23296132470b8639599b837ae671
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_76c6f946e64d2150f99503953127c647
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_03
ef27d7e6834e4c7519a8db19369742/brick N/A N/A N N/A
Task Status of Volume vol_76c6f946e64d2150f99503953127c647
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_84337b6825c0eb3d7a0e6008b65dd757
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_a3
cef78a5914a2808da0b5736e3daec7/brick N/A N/A N N/A
Task Status of Volume vol_84337b6825c0eb3d7a0e6008b65dd757
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_9e6cad52d8a8e2e7f8febe2709ef253a
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_29
88103500386566a0ef4dd3fa69e429/brick N/A N/A N N/A
Task Status of Volume vol_9e6cad52d8a8e2e7f8febe2709ef253a
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_a720850474f6ce7ae6c57dcc60284b1f
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_03
4310c050aa134b254316068472b4cc/brick N/A N/A N N/A
Task Status of Volume vol_a720850474f6ce7ae6c57dcc60284b1f
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_c98a28cd587883dc2882c00695b02d52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_38
41cba307728c0bd2a66a1429160112/brick N/A N/A N N/A
Task Status of Volume vol_c98a28cd587883dc2882c00695b02d52
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_ced2ac693a19d4ae53af897eaf13bd86
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_c6
4cb733906d43c5101044898eac8a35/brick N/A N/A N N/A
Task Status of Volume vol_ced2ac693a19d4ae53af897eaf13bd86
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_dcece16823bead8503333ef11c022775
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_76
bd80272c57164663bec3b1c9750366/brick N/A N/A N N/A
Task Status of Volume vol_dcece16823bead8503333ef11c022775
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_e6cbcf7bcb912d6c9725f3390f96b4b3
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_63
ec19a814ece3152021772f71ddbd92/brick N/A N/A N N/A
Task Status of Volume vol_e6cbcf7bcb912d6c9725f3390f96b4b3
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_eaa27e3100f78bff42ff337f163fee0f
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_38
7ecde606556b9d25487167b02e1e6b/brick N/A N/A N N/A
Task Status of Volume vol_eaa27e3100f78bff42ff337f163fee0f
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol_ff040fc48f8bd16727423b59ac7244c6
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_21
2c67914837cef8a927922ee63c7ee7/brick N/A N/A N N/A
Task Status of Volume vol_ff040fc48f8bd16727423b59ac7244c6
------------------------------------------------------------------------------
There are no active volume tasks
My heketi volume info:
gluster> volume info heketidbstorage
Volume Name: heketidbstorage
Type: Distribute
Volume ID: 34b897d0-0953-4f8f-9c5c-54e043e55d92
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1:
10.10.168.25:/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick
Options Reconfigured:
user.heketi.id: 1d2400626dac780fce12e45a07494853
transport.address-family: inet
nfs.disable: on
Our gluster settings/volume options:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gluster-heketi
selfLink: /apis/storage.k8s.io/v1/storageclasses/gluster-heketi
parameters:
gidMax: "50000"
gidMin: "2000"
resturl: http://10.233.35.158:8080
restuser: "null"
restuserkey: "null"
volumetype: "none"
volumeoptions: cluster.post-op-delay-secs 0, performance.client-io-threads
off, performance.open-behind off, performance.readdir-ahead off,
performance.read-ahead off, performance.stat-prefetch off,
performance.write-behind off, performance.io-cache off,
cluster.consistent-metadata on, performance.quick-read off,
performance.strict-o-direct on
provisioner: kubernetes.io/glusterfs
reclaimPolicy: Delete
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list