[Gluster-users] glusterfsd Call Trace Messages

Thu Feb 4 15:05:50 UTC 2016

Am 2016-02-04 15:51, schrieb Raghavendra Bhat:
> It depends upon the memory available and the workload. In this case,
> the size of the files being copied are huge. So more I/O happens to
> completely copy the file. 
> 
> Can you please give the o/p of "gluster volume info <volume name>"?
> 
> Regards,
> Raghavendra
> 
> On Wed, Feb 3, 2016 at 4:54 PM, Taste-Of-IT <kontakt at taste-of-it.de>
> wrote:
> 
>> Am 2016-02-03 21:24, schrieb Raghavendra Bhat:
>> 
>> I think this is what is happening. Someone please correct me if I
>> am
>> wrong.
>> 
>> I think this is happening because nfs client, nfs server and bricks
>> being in the same machine. What happens is, when the large write
>> comes, nfs client sends the request to the nfs server and the nfs
>> server sends it to the brick. The brick process tries to write it
>> via
>> making the write system call and the call enters the kernel. Kernel
>> might not find memory available for performing the operation and
>> thus
>> wants to free some memory. NFS client does heavy caching. It might
>> have saved many things in its memory. So, it has to free some
>> memory.
>> But nfs client is stuck with the write operation. It is still
>> waiting
>> for a response from the server. So it will not be able to free the
>> memory till it gets a response from the nfs server (which in turn
>> is
>> waiting for a response from the brick) for the write operation it
>> sent. But brick cannot get a response from kernel until kernel is
>> able
>> to get some memory for the operation and perform write.
>> 
>> Thus it is stuck in this deadlock. Thats why you see your setup
>> blocked.
>> 
>> Can you please mount your volume via nfs on a different node other
>> than the gluster server, and see if the issue happens again?
>> 
>> Regards,
>> Raghavendra
>> 
>> On Wed, Feb 3, 2016 at 2:32 PM, Taste-Of-IT
>> <kontakt at taste-of-it.de>
>> wrote:
>> 
>> Am 2016-02-03 20:09, schrieb Raghavendra Bhat:
>> 
>> Hi,
>> 
>> Is your nfs client mounted on one of the gluster serves? 
>> 
>> Regards,
>> Raghavendra
>> 
>> On Wed, Feb 3, 2016 at 10:08 AM, Taste-Of-IT
>> <kontakt at taste-of-it.de>
>> wrote:
>> 
>> Hello,
>> 
>> hope some expert can help. I have a 2 Brick 1 Volume Distributed
>> GlusterFS in Version 3.7.6 on Debian. The volume is shared via nfs.
>> If i copy via midnight commander large files (>30GB), i got
>> following messages. I replace sata cable, checked memory but i
>> didnt
>> find an error. SMART Values on all disks seems ok. After 30-40
>> minutes i can copy again. Any Idea?
>> 
>> Feb  3 12:46:31 gluster01 kernel: [11186.588367] [sched_delayed]
>> sched: RT throttling activated
>> Feb  3 12:56:09 gluster01 kernel: [11764.932749] glusterfsd   
>>   D ffff88040ca6d788     0  1150      1 0x00000000
>> Feb  3 12:56:09 gluster01 kernel: [11764.932759] 
>> ffff88040ca6d330 0000000000000082 0000000000012f00 ffff88040ad1bfd8
>> Feb  3 12:56:09 gluster01 kernel: [11764.932767] 
>> 0000000000012f00 ffff88040ca6d330 ffff88040ca6d330 ffff88040ad1be88
>> Feb  3 12:56:09 gluster01 kernel: [11764.932773] 
>> ffff88040e18d4b8 ffff88040e18d4a0 ffffffff00000000 ffff88040e18d4a8
>> Feb  3 12:56:09 gluster01 kernel: [11764.932780] Call Trace:
>> Feb  3 12:56:09 gluster01 kernel: [11764.932796] 
>> [<ffffffff81512cd5>] ? rwsem_down_write_failed+0x1d5/0x320
>> Feb  3 12:56:09 gluster01 kernel: [11764.932807] 
>> [<ffffffff812b7d13>] ? call_rwsem_down_write_failed+0x13/0x20
>> Feb  3 12:56:09 gluster01 kernel: [11764.932816] 
>> [<ffffffff812325b0>] ? proc_keys_show+0x3f0/0x3f0
>> Feb  3 12:56:09 gluster01 kernel: [11764.932823] 
>> [<ffffffff81512649>] ? down_write+0x29/0x40
>> Feb  3 12:56:09 gluster01 kernel: [11764.932830] 
>> [<ffffffff811592bc>] ? vm_mmap_pgoff+0x6c/0xc0
>> Feb  3 12:56:09 gluster01 kernel: [11764.932838] 
>> [<ffffffff8116ea4e>] ? SyS_mmap_pgoff+0x10e/0x250
>> Feb  3 12:56:09 gluster01 kernel: [11764.932844] 
>> [<ffffffff811a969a>] ? SyS_readv+0x6a/0xd0
>> Feb  3 12:56:09 gluster01 kernel: [11764.932853] 
>> [<ffffffff81513ccd>] ? system_call_fast_compare_end+0x10/0x15
>> Feb  3 12:58:09 gluster01 kernel: [11884.979935] glusterfsd   
>>   D ffff88040ca6d788     0  1150      1 0x00000000
>> Feb  3 12:58:09 gluster01 kernel: [11884.979945] 
>> ffff88040ca6d330 0000000000000082 0000000000012f00 ffff88040ad1bfd8
>> Feb  3 12:58:09 gluster01 kernel: [11884.979952] 
>> 0000000000012f00 ffff88040ca6d330 ffff88040ca6d330 ffff88040ad1be88
>> Feb  3 12:58:09 gluster01 kernel: [11884.979959] 
>> ffff88040e18d4b8 ffff88040e18d4a0 ffffffff00000000 ffff88040e18d4a8
>> Feb  3 12:58:09 gluster01 kernel: [11884.979966] Call Trace:
>> Feb  3 12:58:09 gluster01 kernel: [11884.979982] 
>> [<ffffffff81512cd5>] ? rwsem_down_write_failed+0x1d5/0x320
>> Feb  3 12:58:09 gluster01 kernel: [11884.979993] 
>> [<ffffffff812b7d13>] ? call_rwsem_down_write_failed+0x13/0x20
>> Feb  3 12:58:09 gluster01 kernel: [11884.980001] 
>> [<ffffffff812325b0>] ? proc_keys_show+0x3f0/0x3f0
>> Feb  3 12:58:09 gluster01 kernel: [11884.980008] 
>> [<ffffffff81512649>] ? down_write+0x29/0x40
>> Feb  3 12:58:09 gluster01 kernel: [11884.980015] 
>> [<ffffffff811592bc>] ? vm_mmap_pgoff+0x6c/0xc0
>> Feb  3 12:58:09 gluster01 kernel: [11884.980023] 
>> [<ffffffff8116ea4e>] ? SyS_mmap_pgoff+0x10e/0x250
>> Feb  3 12:58:09 gluster01 kernel: [11884.980030] 
>> [<ffffffff811a969a>] ? SyS_readv+0x6a/0xd0
>> Feb  3 12:58:09 gluster01 kernel: [11884.980038] 
>> [<ffffffff81513ccd>] ? system_call_fast_compare_end+0x10/0x15
>> Feb  3 12:58:09 gluster01 kernel: [11884.980351] mc         
>>     D ffff88040e6d8fb8     0  5119   1447 0x00000000
>> Feb  3 12:58:09 gluster01 kernel: [11884.980358] 
>> ffff88040e6d8b60 0000000000000082 0000000000012f00 ffff88040d5dbfd8
>> Feb  3 12:58:09 gluster01 kernel: [11884.980365] 
>> 0000000000012f00 ffff88040e6d8b60 ffff88041ec937b0 ffff88041efcc9e8
>> Feb  3 12:58:09 gluster01 kernel: [11884.980371] 
>> 0000000000000002 ffffffff8113ce00 ffff88040d5dbcb0 ffff88040d5dbd98
>> Feb  3 12:58:09 gluster01 kernel: [11884.980377] Call Trace:
>> Feb  3 12:58:09 gluster01 kernel: [11884.980385] 
>> [<ffffffff8113ce00>] ? wait_on_page_read+0x60/0x60
>> Feb  3 12:58:09 gluster01 kernel: [11884.980392] 
>> [<ffffffff81510759>] ? io_schedule+0x99/0x120
>> Feb  3 12:58:09 gluster01 kernel: [11884.980399] 
>> [<ffffffff8113ce0a>] ? sleep_on_page+0xa/0x10
>> Feb  3 12:58:09 gluster01 kernel: [11884.980405] 
>> [<ffffffff81510adc>] ? __wait_on_bit+0x5c/0x90
>> Feb  3 12:58:09 gluster01 kernel: [11884.980412] 
>> [<ffffffff8113cbff>] ? wait_on_page_bit+0x7f/0x90
>> Feb  3 12:58:09 gluster01 kernel: [11884.980420] 
>> [<ffffffff810a7bd0>] ? autoremove_wake_function+0x30/0x30
>> Feb  3 12:58:09 gluster01 kernel: [11884.980426] 
>> [<ffffffff8114a17d>] ? pagevec_lookup_tag+0x1d/0x30
>> Feb  3 12:58:09 gluster01 kernel: [11884.980433] 
>> [<ffffffff8113cce0>] ? filemap_fdatawait_range+0xd0/0x160
>> Feb  3 12:58:09 gluster01 kernel: [11884.980442] 
>> [<ffffffff8113e7ca>] ? filemap_write_and_wait_range+0x3a/0x60
>> Feb  3 12:58:09 gluster01 kernel: [11884.980461] 
>> [<ffffffffa072363f>] ? nfs_file_fsync+0x7f/0x100 [nfs]
>> Feb  3 12:58:09 gluster01 kernel: [11884.980476] 
>> [<ffffffffa0723a2a>] ? nfs_file_write+0xda/0x1a0 [nfs]
>> Feb  3 12:58:09 gluster01 kernel: [11884.980484] 
>> [<ffffffff811a7e24>] ? new_sync_write+0x74/0xa0
>> Feb  3 12:58:09 gluster01 kernel: [11884.980492] 
>> [<ffffffff811a8562>] ? vfs_write+0xb2/0x1f0
>> Feb  3 12:58:09 gluster01 kernel: [11884.980500] 
>> [<ffffffff811a842d>] ? vfs_read+0xed/0x170
>> Feb  3 12:58:09 gluster01 kernel: [11884.980505] 
>> [<ffffffff811a90a2>] ? SyS_write+0x42/0xa0
>> Feb  3 12:58:09 gluster01 kernel: [11884.980513] 
>> [<ffffffff81513ccd>] ? system_call_fast_compare_end+0x10/0x15
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users [1] [1] [1]
>> 
>> Links:
>> ------
>> [1] http://www.gluster.org/mailman/listinfo/gluster-users [1] [1]
>>  Hi Raghavendra,
>>  yes in this case i have to mount on one of the gluster server,
>> but it
>> doesnt matter on which i mount and its only a question of time when
>> the trace came.
>>  Taste
>> 
>>  _______________________________________________
>>  Gluster-users mailing list
>>  Gluster-users at gluster.org
>>  http://www.gluster.org/mailman/listinfo/gluster-users [1] [1]
>> 
>> Links:
>> ------
>> [1] http://www.gluster.org/mailman/listinfo/gluster-users [1]
> 
>  Hi,
>  sounds logical. Is that a normal behavior? I tested it from a client
> and it looks fine, without trace. I tried 4 files about 30GB. The only
> thing i notice is, that the first file was copied with nearly full
> bandwidth, over both server, but the second was only with 20-30
> Percent of possible bandwith. are there any perforamnce / stable
> option which i can use for nfs or glusterfs mount?
> 
>  _______________________________________________
>  Gluster-users mailing list
>  Gluster-users at gluster.org
>  http://www.gluster.org/mailman/listinfo/gluster-users [1]
> 
> 
> Links:
> ------
> [1] http://www.gluster.org/mailman/listinfo/gluster-users
Hi,

is there a calculation like filesize*1,5= max needed Memory?
gluster volume info vol4

Volume Name: vol4
Type: Distribute
Volume ID: a2b7c6e9-0298-4222-83f6-b9a4f7f80da7
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: clusternode01:/media/node01/vol4
Brick2: clusternode02:/media/node02/vol4
Options Reconfigured:
auth.allow: 192.168.0.*