[Gluster-devel] Gluster 3.6.2 On Xeon Phi

Mon Feb 9 12:23:38 UTC 2015

In rdma.c : gf_rdma_do_reads : pthread_mutex_lock
(&priv->write_mutex); - lock guards against what?

On Mon, Feb 9, 2015 at 1:10 AM, Mohammed Rafi K C <rkavunga at redhat.com> wrote:
>
> On 02/08/2015 07:52 PM, Rudra Siva wrote:
>> Thanks for trying and sending the changes - finally got it all working
>> ... it turned out to be a problem with my changes (in
>> gf_rdma_post_unref - goes back to lack of SRQ on the interface)
>>
>> You may be able to simulate the crash if you set volume parameters to
>> something like the following (it would be purely academic):
>>
>> gluster volume set data_volume diagnostics.brick-log-level TRACE
>> gluster volume set data_volume diagnostics.client-log-level TRACE
>>
>> Had those because stuff began from communication problems (queue size,
>> lack of SRQ) so things have come a long way from there - will test for
>> some more time and make my small changes available.
>>
>> The transfer speeds of the default VE (Virtual Ethernet) that Intel
>> ships with it is ~6 MB/sec  - presently with Gluster I see around 80
>> MB/sec on the virtual IB (there is no real infiniband card) and with a
>> stable gluster mount. The interface benchmarks show it can give 5000
>> MB/sec so there looks to be more room for improvement - stable gluster
>> mount is required first though for doing anything.
>>
>> Questions:
>>
>> 1. ctx is shared between posts - parts of code with locks and without
>> - intentional/oversight?
> I didn't get your question properly. If you are talking about the ctx
> inside the post variable, it is not shared.
>
>> 2.  iobuf_pool->default_page_size  = 128 * GF_UNIT_KB - why is 128 KB
>> chosen and not higher?
> For glusterfs default page size is 128KB. May be because of fuse is
> limited to 128KB. I'm not sure about the exact reason.
>
>>
>> -Siva
>>
>>
>> On Fri, Feb 6, 2015 at 6:12 AM, Mohammed Rafi K C <rkavunga at redhat.com> wrote:
>>> On 02/06/2015 05:31 AM, Rudra Siva wrote:
>>>> Rafi,
>>>>
>>>> Sorry it took me some time - I had to merge these with some of my
>>>> changes - the scif0 (iWARP) does not support SRQ (max_srq : 0) so have
>>>> changed some of the code to use QP instead - can provide those if
>>>> there is interest after this is stable.
>>>>
>>>> Here's the good -
>>>>
>>>> The performance with the patches is better than without (esp.
>>>> http://review.gluster.org/#/c/9327/).
>>> Good to hear. My thought was, http://review.gluster.org/#/c/9506/  will
>>> give a much better performance than the others :-) . A rebase is needed
>>> if it is applying on top the other patches.
>>>
>>>> The bad - glusterfsd crashes for large files so it's difficult to get
>>>> some decent benchmark numbers
>>> Thanks for rising the bug. I tried to reproduce the problem on 3.6.2
>>> version+the four patches with a simple distributed volume. But I
>>> couldn't reproduce the same, and still trying. (we are using mellanox ib
>>> cards).
>>>
>>> If possible can you please share the volume info and workload used for
>>> large files.
>>>
>>>
>>>> - small ones look good - trying to
>>>> understand the patch at this time. Looks like this code comes from
>>>> 9327 as well.
>>>>
>>>> Can you please review the reset of mr_count?
>>> Yes, The problem could be the wrong value in mr_count. And I guess we
>>> failed to reset the value to zero, so that for some I/O mr_count will be
>>> incremented couple of times. So the variable might be got overflown. Can
>>> you apply the patch attached with mail, and try with this.
>>>
>>>> Info from gdb is as follows - if you need more or something jumps out
>>>> please feel free to let me know.
>>>>
>>>> (gdb) p *post
>>>> $16 = {next = 0x7fffe003b280, prev = 0x7fffe0037cc0, mr =
>>>> 0x7fffe0037fb0, buf = 0x7fffe0096000 "\005\004", buf_size = 4096, aux
>>>> = 0 '\000',
>>>>   reused = 1, device = 0x7fffe00019c0, type = GF_RDMA_RECV_POST, ctx =
>>>> {mr = {0x7fffe0003020, 0x7fffc8005f20, 0x7fffc8000aa0, 0x7fffc80030c0,
>>>>       0x7fffc8002d70, 0x7fffc8008bb0, 0x7fffc8008bf0, 0x7fffc8002cd0},
>>>> mr_count = -939493456, vector = {{iov_base = 0x7ffff7fd6000,
>>>>         iov_len = 112}, {iov_base = 0x7fffbf140000, iov_len = 131072},
>>>> {iov_base = 0x0, iov_len = 0} <repeats 14 times>}, count = 2,
>>>>     iobref = 0x7fffc8001670, hdr_iobuf = 0x61d710, is_request = 0
>>>> '\000', gf_rdma_reads = 1, reply_info = 0x0}, refcount = 1, lock = {
>>>>     __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
>>>> __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>>>>     __size = '\000' <repeats 39 times>, __align = 0}}
>>>>
>>>> (gdb) bt
>>>> #0  0x00007fffe7142681 in __gf_rdma_register_local_mr_for_rdma
>>>> (peer=0x7fffe0001800, vector=0x7fffe003b108, count=1,
>>>> ctx=0x7fffe003b0b0)
>>>>     at rdma.c:2255
>>>> #1  0x00007fffe7145acd in gf_rdma_do_reads (peer=0x7fffe0001800,
>>>> post=0x7fffe003b070, readch=0x7fffe0096010) at rdma.c:3609
>>>> #2  0x00007fffe714656e in gf_rdma_recv_request (peer=0x7fffe0001800,
>>>> post=0x7fffe003b070, readch=0x7fffe0096010) at rdma.c:3859
>>>> #3  0x00007fffe714691d in gf_rdma_process_recv (peer=0x7fffe0001800,
>>>> wc=0x7fffceffcd20) at rdma.c:3967
>>>> #4  0x00007fffe7146e7d in gf_rdma_recv_completion_proc
>>>> (data=0x7fffe0002b30) at rdma.c:4114
>>>> #5  0x00007ffff72cfdf3 in start_thread () from /lib64/libpthread.so.0
>>>> #6  0x00007ffff6c403dd in clone () from /lib64/libc.so.6
>>>>
>>>> On Fri, Jan 30, 2015 at 7:11 AM, Mohammed Rafi K C <rkavunga at redhat.com> wrote:
>>>>> On 01/29/2015 06:13 PM, Rudra Siva wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Have been able to get Gluster running on Intel's MIC platform. The
>>>>>> only code change to Gluster source was an unresolved yylex (I am not
>>>>>> really sure why that was coming up - may be someone more familiar with
>>>>>> it's use in Gluster can answer).
>>>>>>
>>>>>> At the step for compiling the binaries (glusterd, glusterfsd,
>>>>>> glusterfs, glfsheal)  build breaks with an unresolved yylex error.
>>>>>>
>>>>>> For now have a routine yylex that simply calls graphyylex - I don't
>>>>>> know if this is even correct however mount functions.
>>>>>>
>>>>>> GCC - 4.7 (it's an oddity, latest GCC is missing the Phi patches)
>>>>>>
>>>>>> flex --version
>>>>>> flex 2.5.39
>>>>>>
>>>>>> bison --version
>>>>>> bison (GNU Bison) 3.0
>>>>>>
>>>>>> I'm still working on testing the RDMA and Infiniband support and can
>>>>>> make notes, numbers available when that is complete.
>>>>> There are couple of rdma performance related patches under review. If
>>>>> you could make use of those patches, I hope that will give a performance
>>>>> enhancement.
>>>>>
>>>>> [1] : http://review.gluster.org/#/c/9329/
>>>>> [2] : http://review.gluster.org/#/c/9321/
>>>>> [3] : http://review.gluster.org/#/c/9327/
>>>>> [4] : http://review.gluster.org/#/c/9506/
>>>>>
>>>>> Let me know if you need any clarification.
>>>>>
>>>>> Regards!
>>>>> Rafi KC
>>>>
>>
>>
>

-- 
-Siva