[Gluster-users] Mounting of Gluster volumes in Kubernetes

Wed Oct 18 12:52:57 UTC 2017

Hi all,

Wondered if there are others in the community using GlusterFS on Google
Compute Engine and Kubernetes via Google Container Engine together.

We're running glusterfs 3.7.6 on Ubuntu Xenial across 3 GCE nodes. We have
a single replicated volume of ~800GB that our pods running in Kubernetes
are mounting.

We've observed a pattern of soft lockups on our Kubernetes nodes that mount
our Gluster volume. These nodes seem to be those that have the highest rate
of reads/writes to the Gluster volume.

An example looks like:

[495498.074071] Kernel panic - not syncing: softlockup: hung tasks
[495498.080108] CPU: 0 PID: 10166 Comm: nginx Tainted: G             L
4.4.64+ #1
[495498.087524] Hardware name: Google Google Compute Engine/Google Compute
Engine, BIOS Google 01/01/2011
[495498.096947]  0000000000000000 ffff8803ffc03e20 ffffffffa1317394
ffffffffa1713537
[495498.105113]  ffff8803ffc03eb0 ffff8803ffc03ea0 ffffffffa1139bbc
0000000000000008
[495498.113187]  ffff8803ffc03eb0 ffff8803ffc03e48 000000000000009c
0000000000000000
[495498.121488] Call Trace:
[495498.124131]  <IRQ>  [<ffffffffa1317394>] dump_stack+0x63/0x8f
[495498.130207]  [<ffffffffa1139bbc>] panic+0xc6/0x1ec
[495498.135208]  [<ffffffffa10f65a7>] watchdog_timer_fn+0x1e7/0x1f0
[495498.141327]  [<ffffffffa10f63c0>] ? watchdog+0xa0/0xa0
[495498.146668]  [<ffffffffa10b8f1f>] __hrtimer_run_queues+0xff/0x260
[495498.152959]  [<ffffffffa10b93ec>] hrtimer_interrupt+0xac/0x1b0
[495498.158993]  [<ffffffffa15b2918>] smp_apic_timer_interrupt+0x68/0xa0
[495498.167232]  [<ffffffffa15b1222>] apic_timer_interrupt+0x82/0x90
[495498.173432]  <EOI>  [<ffffffffa109a6d0>] ?
prepare_to_wait_exclusive+0x80/0x80
[495498.182557]  [<ffffffffc02e331f>] ? 0xffffffffc02e331f
[495498.187893]  [<ffffffffa109a9e0>] ? prepare_to_wait_event+0xf0/0xf0
[495498.194357]  [<ffffffffc02e3679>] 0xffffffffc02e3679
[495498.199519]  [<ffffffffc02e723a>] fuse_simple_request+0x11a/0x1e0 [fuse]
[495498.206415]  [<ffffffffc02e7f71>] fuse_dev_cleanup+0xa81/0x1ef0 [fuse]
[495498.213151]  [<ffffffffa11b91a9>] lookup_fast+0x249/0x330
[495498.218748]  [<ffffffffa11b95bd>] walk_component+0x3d/0x500

While the particular issue seems more related to the Fuse client talking to
Gluster, we're wondering if others have seen this type of behavior, if
there are particular troubleshooting/tuning steps we might be advised to
the take on the Gluster side of the problem, and if the community has any
general tips around using Gluster and Kubernetes together.

Thanks in advance,
Travis Truman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171018/02bba4eb/attachment.html>