[Gluster-users] NFS over gluster stops responding under write load

Tue Sep 18 22:06:25 UTC 2012

On Tue, Sep 18, 2012 at 2:59 PM, Lonni J Friedman <netllama at gmail.com> wrote:
> Greetings,
> I'm running version 3.3.0 on Fedora16-x86_64.  I have two bricks setup
> with a volume doing basic replication on an XFS formatted filesystem.
> I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to
> write to the NFS mount point.  After a few minutes, I noticed that
> bonnie++ didn't seem to be generating any more progress output, at
> which point I started checking assorted logs to see if anything was
> wrong.  At that point, I saw that the client system was no longer able
> to write to the NFS mount point, and dmesg (and /var/log/messages) was
> spewing these warnings like crazy (dozens/second):
> nfs: server 10.31.99.166 not responding, still trying trying
>
> Those warnings started at 14:40:58 on the client system, but oddly
> stopped a few seconds later at 14:41:04.  Here's the full bonnie++
> output (/mnt/gv0 is where the gluster file system is mounted as an NFS
> client):
> ########
> [root at cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root
> Using uid:0, gid:0.
> Writing a byte at a time...done
> Writing intelligently...done
> Rewriting...
> ########
>
> Here's what's in the glusterfs logs at the moment:
> ##########
> # tail etc-glusterfs-glusterd.vol.log
> [2012-09-18 14:54:39.026557] I
> [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received
> op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
> [2012-09-18 14:54:39.029463] I
> [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd:
> Responded to stage, ret: 0
> [2012-09-18 14:54:46.993426] I
> [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume]
> 0-management: Received heal vol req for volume gv0
> [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock]
> 0-glusterd: Unable to get lock for uuid:
> e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by:
> 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
> [2012-09-18 14:54:46.993520] E
> [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to
> acquire local lock, ret: -1
> [2012-09-18 14:55:47.175521] I
> [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd:
> Received get vol req
> [2012-09-18 14:55:47.181048] I
> [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd:
> Received get vol req
> [2012-09-18 14:55:49.306776] I
> [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume]
> 0-management: Received heal vol req for volume gv0
> [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock]
> 0-glusterd: Unable to get lock for uuid:
> e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by:
> 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
> [2012-09-18 14:55:49.306844] E
> [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to
> acquire local lock, ret: -1
> # tail -f cli.log
> [2012-09-18 14:55:47.176824] I
> [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to
> get vol: 0
> [2012-09-18 14:55:47.180959] I
> [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0
> [2012-09-18 14:55:47.181128] I
> [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to
> get vol: 0
> [2012-09-18 14:55:47.181167] I
> [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0
> [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0
> [2012-09-18 14:55:49.244795] W
> [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing
> 'option transport-type'. defaulting to "socket"
> [2012-09-18 14:55:49.307054] I
> [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to
> heal volume
> [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
> [0x328ca0f965] (-->gluster(gf_cli3_1_heal_volume_cbk+0x1d4)
> [0x4225e4]))) 0-dict: buf is null!
> [2012-09-18 14:55:49.307289] E
> [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate
> memory
> [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1
> ##########
>
> I'd be happy to provide more if someone requests something specific.
>
> Not sure what other information to provide at this point, but here's
> the basics of the gluster setup:
> ##########
> # gluster volume info all
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.31.99.165:/mnt/sdb1
> Brick2: 10.31.99.166:/mnt/sdb1
> # gluster volume heal gv0 info
> operation failed
> ##########
>
> I just noticed that glusterfs seems to be rapidly heading towards OOM
> territory.  The glusterfs daemon is currently consuming 90% of MEM
> according to top.

I just attempted to shutdown the glusterd service, and it ran off a
cliff.  the OOM killer kicked in and killed it.  From dmesg:
#########
[ 4151.733182] glusterfsd invoked oom-killer: gfp_mask=0x201da,
order=0, oom_adj=0, oom_score_adj=0
[ 4151.733186] glusterfsd cpuset=/ mems_allowed=0
[ 4151.733189] Pid: 2567, comm: glusterfsd Not tainted 3.4.9-2.fc16.x86_64 #1
[ 4151.733191] Call Trace:
[ 4151.733217]  [<ffffffff810c7e61>] ? cpuset_print_task_mems_allowed+0x91/0xa0
[ 4151.733235]  [<ffffffff815f4904>] dump_header+0x80/0x1b0
[ 4151.733247]  [<ffffffff81125bfb>] ? oom_unkillable_task+0x8b/0xc0
[ 4151.733274]  [<ffffffff8126cd45>] ? security_capable_noaudit+0x15/0x20
[ 4151.733288]  [<ffffffff812c9940>] ? ___ratelimit+0xa0/0x120
[ 4151.733297]  [<ffffffff81126360>] oom_kill_process+0x280/0x2b0
[ 4151.733299]  [<ffffffff8112688b>] out_of_memory+0x26b/0x3f0
[ 4151.733303]  [<ffffffff8112bc5a>] __alloc_pages_nodemask+0x8aa/0x950
[ 4151.733316]  [<ffffffff81163a66>] alloc_pages_current+0xb6/0x120
[ 4151.733325]  [<ffffffff81122127>] __page_cache_alloc+0xb7/0xe0
[ 4151.733333]  [<ffffffff81124d4b>] filemap_fault+0x2db/0x4a0
[ 4151.733336]  [<ffffffff8114557f>] __do_fault+0x6f/0x520
[ 4151.733348]  [<ffffffff8103ec29>] ? kvm_clock_read+0x19/0x20
[ 4151.733363]  [<ffffffff8101b9e9>] ? sched_clock+0x9/0x10
[ 4151.733372]  [<ffffffff8108de55>] ? sched_clock_local+0x25/0x90
[ 4151.733375]  [<ffffffff811486c7>] handle_pte_fault+0xf7/0xad0
[ 4151.733377]  [<ffffffff8108bca0>] ? wake_up_state+0x10/0x20
[ 4151.733383]  [<ffffffff810ae7eb>] ? wake_futex+0x3b/0x60
[ 4151.733385]  [<ffffffff810af351>] ? futex_wake+0x101/0x120
[ 4151.733392]  [<ffffffff81179287>] ? mem_cgroup_count_vm_event+0x47/0x60
[ 4151.733394]  [<ffffffff81149458>] handle_mm_fault+0x1f8/0x310
[ 4151.733399]  [<ffffffff8103f9f5>] ? pvclock_clocksource_read+0x55/0xf0
[ 4151.733412]  [<ffffffff81601742>] do_page_fault+0x142/0x4f0
[ 4151.733417]  [<ffffffff8108fe4d>] ? set_next_entity+0x9d/0xb0
[ 4151.733420]  [<ffffffff8103f9f5>] ? pvclock_clocksource_read+0x55/0xf0
[ 4151.733423]  [<ffffffff815fe2e5>] page_fault+0x25/0x30
[ 4151.740605] Out of memory: Kill process 859 (glusterfs) score 859
or sacrifice child
[ 4151.740907] Killed process 859 (glusterfs) total-vm:6140796kB,
anon-rss:2627280kB, file-rss:0kB
########