[Gluster-devel] Gluster 3.6.2 On Xeon Phi

Mon Feb 9 06:10:58 UTC 2015

On 02/08/2015 07:52 PM, Rudra Siva wrote:
> Thanks for trying and sending the changes - finally got it all working
> ... it turned out to be a problem with my changes (in
> gf_rdma_post_unref - goes back to lack of SRQ on the interface)
>
> You may be able to simulate the crash if you set volume parameters to
> something like the following (it would be purely academic):
>
> gluster volume set data_volume diagnostics.brick-log-level TRACE
> gluster volume set data_volume diagnostics.client-log-level TRACE
>
> Had those because stuff began from communication problems (queue size,
> lack of SRQ) so things have come a long way from there - will test for
> some more time and make my small changes available.
>
> The transfer speeds of the default VE (Virtual Ethernet) that Intel
> ships with it is ~6 MB/sec  - presently with Gluster I see around 80
> MB/sec on the virtual IB (there is no real infiniband card) and with a
> stable gluster mount. The interface benchmarks show it can give 5000
> MB/sec so there looks to be more room for improvement - stable gluster
> mount is required first though for doing anything.
>
> Questions:
>
> 1. ctx is shared between posts - parts of code with locks and without
> - intentional/oversight?
I didn't get your question properly. If you are talking about the ctx
inside the post variable, it is not shared.

> 2.  iobuf_pool->default_page_size  = 128 * GF_UNIT_KB - why is 128 KB
> chosen and not higher?
For glusterfs default page size is 128KB. May be because of fuse is
limited to 128KB. I'm not sure about the exact reason.

>
> -Siva
>
>
> On Fri, Feb 6, 2015 at 6:12 AM, Mohammed Rafi K C <rkavunga at redhat.com> wrote:
>> On 02/06/2015 05:31 AM, Rudra Siva wrote:
>>> Rafi,
>>>
>>> Sorry it took me some time - I had to merge these with some of my
>>> changes - the scif0 (iWARP) does not support SRQ (max_srq : 0) so have
>>> changed some of the code to use QP instead - can provide those if
>>> there is interest after this is stable.
>>>
>>> Here's the good -
>>>
>>> The performance with the patches is better than without (esp.
>>> http://review.gluster.org/#/c/9327/).
>> Good to hear. My thought was, http://review.gluster.org/#/c/9506/  will
>> give a much better performance than the others :-) . A rebase is needed
>> if it is applying on top the other patches.
>>
>>> The bad - glusterfsd crashes for large files so it's difficult to get
>>> some decent benchmark numbers
>> Thanks for rising the bug. I tried to reproduce the problem on 3.6.2
>> version+the four patches with a simple distributed volume. But I
>> couldn't reproduce the same, and still trying. (we are using mellanox ib
>> cards).
>>
>> If possible can you please share the volume info and workload used for
>> large files.
>>
>>
>>> - small ones look good - trying to
>>> understand the patch at this time. Looks like this code comes from
>>> 9327 as well.
>>>
>>> Can you please review the reset of mr_count?
>> Yes, The problem could be the wrong value in mr_count. And I guess we
>> failed to reset the value to zero, so that for some I/O mr_count will be
>> incremented couple of times. So the variable might be got overflown. Can
>> you apply the patch attached with mail, and try with this.
>>
>>> Info from gdb is as follows - if you need more or something jumps out
>>> please feel free to let me know.
>>>
>>> (gdb) p *post
>>> $16 = {next = 0x7fffe003b280, prev = 0x7fffe0037cc0, mr =
>>> 0x7fffe0037fb0, buf = 0x7fffe0096000 "\005\004", buf_size = 4096, aux
>>> = 0 '\000',
>>>   reused = 1, device = 0x7fffe00019c0, type = GF_RDMA_RECV_POST, ctx =
>>> {mr = {0x7fffe0003020, 0x7fffc8005f20, 0x7fffc8000aa0, 0x7fffc80030c0,
>>>       0x7fffc8002d70, 0x7fffc8008bb0, 0x7fffc8008bf0, 0x7fffc8002cd0},
>>> mr_count = -939493456, vector = {{iov_base = 0x7ffff7fd6000,
>>>         iov_len = 112}, {iov_base = 0x7fffbf140000, iov_len = 131072},
>>> {iov_base = 0x0, iov_len = 0} <repeats 14 times>}, count = 2,
>>>     iobref = 0x7fffc8001670, hdr_iobuf = 0x61d710, is_request = 0
>>> '\000', gf_rdma_reads = 1, reply_info = 0x0}, refcount = 1, lock = {
>>>     __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
>>> __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>>>     __size = '\000' <repeats 39 times>, __align = 0}}
>>>
>>> (gdb) bt
>>> #0  0x00007fffe7142681 in __gf_rdma_register_local_mr_for_rdma
>>> (peer=0x7fffe0001800, vector=0x7fffe003b108, count=1,
>>> ctx=0x7fffe003b0b0)
>>>     at rdma.c:2255
>>> #1  0x00007fffe7145acd in gf_rdma_do_reads (peer=0x7fffe0001800,
>>> post=0x7fffe003b070, readch=0x7fffe0096010) at rdma.c:3609
>>> #2  0x00007fffe714656e in gf_rdma_recv_request (peer=0x7fffe0001800,
>>> post=0x7fffe003b070, readch=0x7fffe0096010) at rdma.c:3859
>>> #3  0x00007fffe714691d in gf_rdma_process_recv (peer=0x7fffe0001800,
>>> wc=0x7fffceffcd20) at rdma.c:3967
>>> #4  0x00007fffe7146e7d in gf_rdma_recv_completion_proc
>>> (data=0x7fffe0002b30) at rdma.c:4114
>>> #5  0x00007ffff72cfdf3 in start_thread () from /lib64/libpthread.so.0
>>> #6  0x00007ffff6c403dd in clone () from /lib64/libc.so.6
>>>
>>> On Fri, Jan 30, 2015 at 7:11 AM, Mohammed Rafi K C <rkavunga at redhat.com> wrote:
>>>> On 01/29/2015 06:13 PM, Rudra Siva wrote:
>>>>> Hi,
>>>>>
>>>>> Have been able to get Gluster running on Intel's MIC platform. The
>>>>> only code change to Gluster source was an unresolved yylex (I am not
>>>>> really sure why that was coming up - may be someone more familiar with
>>>>> it's use in Gluster can answer).
>>>>>
>>>>> At the step for compiling the binaries (glusterd, glusterfsd,
>>>>> glusterfs, glfsheal)  build breaks with an unresolved yylex error.
>>>>>
>>>>> For now have a routine yylex that simply calls graphyylex - I don't
>>>>> know if this is even correct however mount functions.
>>>>>
>>>>> GCC - 4.7 (it's an oddity, latest GCC is missing the Phi patches)
>>>>>
>>>>> flex --version
>>>>> flex 2.5.39
>>>>>
>>>>> bison --version
>>>>> bison (GNU Bison) 3.0
>>>>>
>>>>> I'm still working on testing the RDMA and Infiniband support and can
>>>>> make notes, numbers available when that is complete.
>>>> There are couple of rdma performance related patches under review. If
>>>> you could make use of those patches, I hope that will give a performance
>>>> enhancement.
>>>>
>>>> [1] : http://review.gluster.org/#/c/9329/
>>>> [2] : http://review.gluster.org/#/c/9321/
>>>> [3] : http://review.gluster.org/#/c/9327/
>>>> [4] : http://review.gluster.org/#/c/9506/
>>>>
>>>> Let me know if you need any clarification.
>>>>
>>>> Regards!
>>>> Rafi KC
>>>
>
>