[Gluster-users] Brick crashes

Sat Jun 9 00:18:33 UTC 2012

Those are 4.x GB. Can you post dmesg output as well? Also, what's 'ulimit
-l' on your system?

On Fri, Jun 8, 2012 at 4:41 PM, Ling Ho <ling at slac.stanford.edu> wrote:

>
> This is the core file from the crash just now
>
> [root at psanaoss213 /]# ls -al core*
> -rw------- 1 root root 4073594880 Jun  8 15:05 core.22682
>
> From yesterday:
> [root at psanaoss214 /]# ls -al core*
> -rw------- 1 root root 4362727424 Jun  8 00:58 core.13483
> -rw------- 1 root root 4624773120 Jun  8 03:21 core.8792
>
>
>
> On 06/08/2012 04:34 PM, Anand Avati wrote:
>
> Is it possible the system was running low on memory? I see you have 48GB,
> but memory registration failure typically would be because the system limit
> on the number of pinnable pages in RAM was hit. Can you tell us the size of
> your core dump files after the crash?
>
>  Avati
>
> On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho <ling at slac.stanford.edu> wrote:
>
>> Hello,
>>
>> I have a brick that crashed twice today, and another different brick that
>> crashed just a while a go.
>>
>> This is what I see in one of the brick logs:
>>
>> patchset: git://git.gluster.com/glusterfs.git
>> patchset: git://git.gluster.com/glusterfs.git
>> signal received: 6
>> signal received: 6
>> time of crash: 2012-06-08 15:05:11
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> fdatasync 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 3.2.6
>> /lib64/libc.so.6[0x34bc032900]
>> /lib64/libc.so.6(gsignal+0x35)[0x34bc032885]
>> /lib64/libc.so.6(abort+0x175)[0x34bc034065]
>> /lib64/libc.so.6[0x34bc06f977]
>> /lib64/libc.so.6[0x34bc075296]
>>
>> /opt/glusterfs/3.2.6/lib64/libglusterfs.so.0(__gf_free+0x44)[0x7f1740ba25e4]
>>
>> /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_destroy+0x47)[0x7f1740956967]
>>
>> /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_unref+0x62)[0x7f1740956a32]
>>
>> /opt/glusterfs/3.2.6/lib64/glusterfs/3.2.6/rpc-transport/rdma.so(+0xc135)[0x7f173ca27135]
>> /lib64/libpthread.so.0[0x34bc8077f1]
>> /lib64/libc.so.6(clone+0x6d)[0x34bc0e5ccd]
>> ---------
>>
>> And somewhere before these, there is also
>> [2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post]
>> 0-rpc-transport/rdma: memory registration failed
>>
>> I have 48GB of memory on the system:
>>
>> # free
>>             total       used       free     shared    buffers     cached
>> Mem:      49416716   34496648   14920068          0      31692   28209612
>> -/+ buffers/cache:    6255344   43161372
>> Swap:      4194296       1740 4192556 <1740%20%20%20%204192556>
>>
>> # uname -a
>> Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22
>> EST 2012 x86_64 x86_64 x86_64 GNU/Linux
>>
>> The server gluster versions is 3.2.6-1. I am using have both rdma clients
>> and tcp clients over 10Gb/s network.
>>
>> Any suggestion what I should look for?
>>
>> Is there a way to just restart the brick, and not glusterd on the server?
>> I have 8 bricks on the server.
>>
>> Thanks,
>> ...
>> ling
>>
>>
>> Here's the volume info:
>>
>> # gluster volume info
>>
>> Volume Name: ana12
>> Type: Distribute
>> Status: Started
>> Number of Bricks: 40
>> Transport-type: tcp,rdma
>> Bricks:
>> Brick1: psanaoss214:/brick1
>> Brick2: psanaoss214:/brick2
>> Brick3: psanaoss214:/brick3
>> Brick4: psanaoss214:/brick4
>> Brick5: psanaoss214:/brick5
>> Brick6: psanaoss214:/brick6
>> Brick7: psanaoss214:/brick7
>> Brick8: psanaoss214:/brick8
>> Brick9: psanaoss211:/brick1
>> Brick10: psanaoss211:/brick2
>> Brick11: psanaoss211:/brick3
>> Brick12: psanaoss211:/brick4
>> Brick13: psanaoss211:/brick5
>> Brick14: psanaoss211:/brick6
>> Brick15: psanaoss211:/brick7
>> Brick16: psanaoss211:/brick8
>> Brick17: psanaoss212:/brick1
>> Brick18: psanaoss212:/brick2
>> Brick19: psanaoss212:/brick3
>> Brick20: psanaoss212:/brick4
>> Brick21: psanaoss212:/brick5
>> Brick22: psanaoss212:/brick6
>> Brick23: psanaoss212:/brick7
>> Brick24: psanaoss212:/brick8
>> Brick25: psanaoss213:/brick1
>> Brick26: psanaoss213:/brick2
>> Brick27: psanaoss213:/brick3
>> Brick28: psanaoss213:/brick4
>> Brick29: psanaoss213:/brick5
>> Brick30: psanaoss213:/brick6
>> Brick31: psanaoss213:/brick7
>> Brick32: psanaoss213:/brick8
>> Brick33: psanaoss215:/brick1
>> Brick34: psanaoss215:/brick2
>> Brick35: psanaoss215:/brick4
>> Brick36: psanaoss215:/brick5
>> Brick37: psanaoss215:/brick7
>> Brick38: psanaoss215:/brick8
>> Brick39: psanaoss215:/brick3
>> Brick40: psanaoss215:/brick6
>> Options Reconfigured:
>> performance.io-thread-count: 16
>> performance.write-behind-window-size: 16MB
>> performance.cache-size: 1GB
>> nfs.disable: on
>> performance.cache-refresh-timeout: 1
>> network.ping-timeout: 42
>> performance.cache-max-file-size: 1PB
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120608/1698655e/attachment.html>