[Bugs] [Bug 1709959] Gluster causing Kubernetes containers to enter crash loop with 'mkdir ... file exists' error message

bugzilla at redhat.com bugzilla at redhat.com
Tue May 14 16:24:22 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1709959



--- Comment #2 from Jeff Bischoff <jeff.bischoff at turbonomic.com> ---
This is the Kubernetes version from our latest failing environments:

        $  kubectl version
        Client Version: version.Info{Major:"1", Minor:"13",
GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2",
GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5",
Compiler:"gc", Platform:"linux/amd64"}
        Server Version: version.Info{Major:"1", Minor:"13",
GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2",
GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5",
Compiler:"gc", Platform:"linux/amd64"}

Here's how the heketi pod looks:


$ kubectl describe pod heketi-7495cdc5fd-xqmxr
Error from server (NotFound): pods "heketi-7495cdc5fd-xqmxr" not found
[turbo at node1 ~]$ kubectl describe pod heketi-7495cdc5fd-xqmxr -n default
Name:               heketi-7495cdc5fd-xqmxr
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               node1/10.10.168.25
Start Time:         Mon, 06 May 2019 02:11:42 +0000
Labels:             glusterfs=heketi-pod
                    heketi=pod
                    pod-template-hash=7495cdc5fd
Annotations:        <none>
Status:             Running
IP:                 10.233.90.85
Controlled By:      ReplicaSet/heketi-7495cdc5fd
Containers:
  heketi:
    Container ID:  
docker://fed61190bf01d149027f187e49a8428e0654fc347de9a9164665f40247c543b3
    Image:          heketi/heketi:dev
    Image ID:      
docker-pullable://heketi/heketi@sha256:bcbf709fd084793e4ff0379f08ca44f71154c270d3a74df2bd146472e2d28402
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      error while creating mount source path
'/var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db':
mkdir
/var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db:
file exists
      Exit Code:    128
      Started:      Tue, 14 May 2019 14:34:55 +0000
      Finished:     Tue, 14 May 2019 14:34:55 +0000
    Ready:          False
    Restart Count:  1735
    Liveness:       http-get http://:8080/hello delay=30s timeout=3s period=10s
#success=1 #failure=3
    Readiness:      http-get http://:8080/hello delay=3s timeout=3s period=10s
#success=1 #failure=3
    Environment:
      HEKETI_USER_KEY:                 
      HEKETI_ADMIN_KEY:                
      HEKETI_EXECUTOR:                 kubernetes
      HEKETI_FSTAB:                    /var/lib/heketi/fstab
      HEKETI_SNAPSHOT_LIMIT:           14
      HEKETI_KUBE_GLUSTER_DAEMONSET:   y
      HEKETI_IGNORE_STALE_OPERATIONS:  true
    Mounts:
      /etc/heketi from config (rw)
      /var/lib/heketi from db (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from
heketi-service-account-token-ntfx2 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  db:
    Type:           Glusterfs (a Glusterfs mount on the host that shares a
pod's lifetime)
    EndpointsName:  heketi-storage-endpoints
    Path:           heketidbstorage
    ReadOnly:       false
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-config-secret
    Optional:    false
  heketi-service-account-token-ntfx2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-service-account-token-ntfx2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                     From            Message
  ----     ------   ----                    ----            -------
  Warning  BackOff  3m36s (x40124 over 6d)  kubelet, node1  Back-off restarting
failed container

I'm not at all familiar with gluster brick logs, but looking at those it
appears that some health checks failed, and they were shut down?

```
[2019-05-08 13:48:33.642896] W [MSGID: 113075]
[posix-helpers.c:1895:posix_fs_health_check]
0-vol_a720850474f6ce7ae6c57dcc60284b1f-posix: aio_write() on
/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_0343
10c050aa134b254316068472b4cc/brick/.glusterfs/health_check returned [Resource
temporarily unavailable]
[2019-05-08 13:48:33.748515] M [MSGID: 113075]
[posix-helpers.c:1962:posix_health_check_thread_proc]
0-vol_a720850474f6ce7ae6c57dcc60284b1f-posix: health-check failed, going down
[2019-05-08 13:48:33.999892] M [MSGID: 113075]
[posix-helpers.c:1981:posix_health_check_thread_proc]
0-vol_a720850474f6ce7ae6c57dcc60284b1f-posix: still alive! -> SIGTERM
[2019-05-08 13:49:04.598861] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f2a27df4dd5]
-->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x562568920d65]
-->/usr/sbin/glusterfsd(cleanup_an
d_exit+0x6b) [0x562568920b8b] ) 0-: received signum (15), shutting down
```
...and...
```
[2019-05-06 03:34:39.698647] I [MSGID: 115036] [server.c:483:server_rpc_notify]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: disconnecting connection from
node1-21644-2019/05/06-02:17:50:364351-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-06 03:34:39.698956] I [MSGID: 101055] [client_t.c:444:gf_client_unref]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: Shutting down connection
node1-21644-2019/05/06-02:17:50:364351-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-06 03:34:54.929155] I [addr.c:55:compare_addr_and_update]
0-/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_1adb1f9dad96381614efc30fe22943a7/brick:
allowed = "*", received addr = "10.10.168.25"
[2019-05-06 03:34:54.929223] I [login.c:111:gf_auth] 0-auth/login: allowed user
names: 57cda2e6-f071-4ec4-b1a5-04f43f91a204
[2019-05-06 03:34:54.929253] I [MSGID: 115029]
[server-handshake.c:495:server_setvolume]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: accepted client from
node1-23801-2019/05/06-03:34:54:882971-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
(version: 3.12.2)
[2019-05-07 11:50:30.502074] I [MSGID: 115036] [server.c:483:server_rpc_notify]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: disconnecting connection from
node1-23801-2019/05/06-03:34:54:882971-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-07 11:50:30.524408] I [MSGID: 101055] [client_t.c:444:gf_client_unref]
0-vol_95b9fad3e8bce2d1c9aac2da2af46057-server: Shutting down connection
node1-23801-2019/05/06-03:34:54:882971-vol_95b9fad3e8bce2d1c9aac2da2af46057-client-0-0-0
[2019-05-07 11:54:45.456189] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f684d7ccdd5]
-->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556c12dead65]
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556c12deab8b] ) 0-: received
signum (15), shutting down


Full gluster logs here: 
[glusterfs.zip](https://github.com/heketi/heketi/files/3178441/glusterfs.zip)

I tried to get the heketi container logs, but it appears they don't exist:


$ kubectl logs -n default heketi-7495cdc5fd-xqmxr -p
failed to open log file
"/var/log/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/heketi/1741.log": open
/var/log/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/heketi/1741.log: no such
file or directory


Gluster seems to indicate that all of my bricks are offline:

[root at node1 /]# gluster
gluster> volume list
heketidbstorage
vol_050f6767658bceaed3e4c58693f3220e
vol_0f8c60645f1014a72b9999036d6244e2
vol_27ef5ea360e90f459d56082bc2b7be9f
vol_59090c2fd20479d553a5baa153d3fcbd
vol_673aef3de9147eaede26b7169ebf5f6e
vol_6848649bb5d29d60985d4d59380caafe
vol_744c23296132470b8639599b837ae671
vol_76c6f946e64d2150f99503953127c647
vol_84337b6825c0eb3d7a0e6008b65dd757
vol_9e6cad52d8a8e2e7f8febe2709ef253a
vol_a720850474f6ce7ae6c57dcc60284b1f
vol_c98a28cd587883dc2882c00695b02d52
vol_ced2ac693a19d4ae53af897eaf13bd86
vol_dcece16823bead8503333ef11c022775
vol_e6cbcf7bcb912d6c9725f3390f96b4b3
vol_eaa27e3100f78bff42ff337f163fee0f
vol_ff040fc48f8bd16727423b59ac7244c6
gluster> volume status
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_a1
6f9f0374fe5db948a60a017a3f5e60/brick        N/A       N/A        N       N/A  

Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_050f6767658bceaed3e4c58693f3220e
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_a6
d6af28e7525bbe3563948f4f9455bd/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_050f6767658bceaed3e4c58693f3220e
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_0f8c60645f1014a72b9999036d6244e2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_ed
81730bd5d36a151cf5163f379474b4/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_0f8c60645f1014a72b9999036d6244e2
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_27ef5ea360e90f459d56082bc2b7be9f
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_76
14e5014a0e402630a0e1fd776acf0a/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_27ef5ea360e90f459d56082bc2b7be9f
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_59090c2fd20479d553a5baa153d3fcbd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_93
e6fdd290a8e963d927de4a1115d17e/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_59090c2fd20479d553a5baa153d3fcbd
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_673aef3de9147eaede26b7169ebf5f6e
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_0e
d4f7f941de388cda678fe273e9ceb4/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_673aef3de9147eaede26b7169ebf5f6e
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_6848649bb5d29d60985d4d59380caafe
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_5f
8b153d183d154b789425f5f5c8f912/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_6848649bb5d29d60985d4d59380caafe
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_744c23296132470b8639599b837ae671
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_c1
1ac2780871f7d759a3da1c27e01941/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_744c23296132470b8639599b837ae671
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_76c6f946e64d2150f99503953127c647
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_03
ef27d7e6834e4c7519a8db19369742/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_76c6f946e64d2150f99503953127c647
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_84337b6825c0eb3d7a0e6008b65dd757
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_a3
cef78a5914a2808da0b5736e3daec7/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_84337b6825c0eb3d7a0e6008b65dd757
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_9e6cad52d8a8e2e7f8febe2709ef253a
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_29
88103500386566a0ef4dd3fa69e429/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_9e6cad52d8a8e2e7f8febe2709ef253a
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_a720850474f6ce7ae6c57dcc60284b1f
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_03
4310c050aa134b254316068472b4cc/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_a720850474f6ce7ae6c57dcc60284b1f
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_c98a28cd587883dc2882c00695b02d52
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_38
41cba307728c0bd2a66a1429160112/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_c98a28cd587883dc2882c00695b02d52
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_ced2ac693a19d4ae53af897eaf13bd86
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_c6
4cb733906d43c5101044898eac8a35/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_ced2ac693a19d4ae53af897eaf13bd86
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_dcece16823bead8503333ef11c022775
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_76
bd80272c57164663bec3b1c9750366/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_dcece16823bead8503333ef11c022775
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_e6cbcf7bcb912d6c9725f3390f96b4b3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_63
ec19a814ece3152021772f71ddbd92/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_e6cbcf7bcb912d6c9725f3390f96b4b3
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_eaa27e3100f78bff42ff337f163fee0f
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_38
7ecde606556b9d25487167b02e1e6b/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_eaa27e3100f78bff42ff337f163fee0f
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: vol_ff040fc48f8bd16727423b59ac7244c6
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.168.25:/var/lib/heketi/mounts/v
g_c197878af606e71a874ad28e3bd7e4e1/brick_21
2c67914837cef8a927922ee63c7ee7/brick        N/A       N/A        N       N/A  

Task Status of Volume vol_ff040fc48f8bd16727423b59ac7244c6
------------------------------------------------------------------------------
There are no active volume tasks


My heketi volume info:

gluster> volume info heketidbstorage

Volume Name: heketidbstorage
Type: Distribute
Volume ID: 34b897d0-0953-4f8f-9c5c-54e043e55d92
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1:
10.10.168.25:/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick
Options Reconfigured:
user.heketi.id: 1d2400626dac780fce12e45a07494853
transport.address-family: inet
nfs.disable: on


Our gluster settings/volume options:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gluster-heketi
  selfLink: /apis/storage.k8s.io/v1/storageclasses/gluster-heketi
parameters:
  gidMax: "50000"
  gidMin: "2000"
  resturl: http://10.233.35.158:8080
  restuser: "null"
  restuserkey: "null"
  volumetype: "none"
  volumeoptions: cluster.post-op-delay-secs 0, performance.client-io-threads
off, performance.open-behind off, performance.readdir-ahead off,
performance.read-ahead off, performance.stat-prefetch off,
performance.write-behind off, performance.io-cache off,
cluster.consistent-metadata on, performance.quick-read off,
performance.strict-o-direct on
provisioner: kubernetes.io/glusterfs
reclaimPolicy: Delete

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list