[Gluster-users] Brick crashes

Fri Jun 8 23:34:12 UTC 2012

Is it possible the system was running low on memory? I see you have 48GB,
but memory registration failure typically would be because the system limit
on the number of pinnable pages in RAM was hit. Can you tell us the size of
your core dump files after the crash?

Avati

On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho <ling at slac.stanford.edu> wrote:

> Hello,
>
> I have a brick that crashed twice today, and another different brick that
> crashed just a while a go.
>
> This is what I see in one of the brick logs:
>
> patchset: git://git.gluster.com/**glusterfs.git<http://git.gluster.com/glusterfs.git>
> patchset: git://git.gluster.com/**glusterfs.git<http://git.gluster.com/glusterfs.git>
> signal received: 6
> signal received: 6
> time of crash: 2012-06-08 15:05:11
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.2.6
> /lib64/libc.so.6[0x34bc032900]
> /lib64/libc.so.6(gsignal+0x35)**[0x34bc032885]
> /lib64/libc.so.6(abort+0x175)[**0x34bc034065]
> /lib64/libc.so.6[0x34bc06f977]
> /lib64/libc.so.6[0x34bc075296]
> /opt/glusterfs/3.2.6/lib64/**libglusterfs.so.0(__gf_free+**
> 0x44)[0x7f1740ba25e4]
> /opt/glusterfs/3.2.6/lib64/**libgfrpc.so.0(rpc_transport_**
> destroy+0x47)[0x7f1740956967]
> /opt/glusterfs/3.2.6/lib64/**libgfrpc.so.0(rpc_transport_**
> unref+0x62)[0x7f1740956a32]
> /opt/glusterfs/3.2.6/lib64/**glusterfs/3.2.6/rpc-transport/**
> rdma.so(+0xc135)[**0x7f173ca27135]
> /lib64/libpthread.so.0[**0x34bc8077f1]
> /lib64/libc.so.6(clone+0x6d)[**0x34bc0e5ccd]
> ---------
>
> And somewhere before these, there is also
> [2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post]
> 0-rpc-transport/rdma: memory registration failed
>
> I have 48GB of memory on the system:
>
> # free
>             total       used       free     shared    buffers     cached
> Mem:      49416716   34496648   14920068          0      31692   28209612
> -/+ buffers/cache:    6255344   43161372
> Swap:      4194296       1740 4192556
>
> # uname -a
> Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22 EST
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> The server gluster versions is 3.2.6-1. I am using have both rdma clients
> and tcp clients over 10Gb/s network.
>
> Any suggestion what I should look for?
>
> Is there a way to just restart the brick, and not glusterd on the server?
> I have 8 bricks on the server.
>
> Thanks,
> ...
> ling
>
>
> Here's the volume info:
>
> # gluster volume info
>
> Volume Name: ana12
> Type: Distribute
> Status: Started
> Number of Bricks: 40
> Transport-type: tcp,rdma
> Bricks:
> Brick1: psanaoss214:/brick1
> Brick2: psanaoss214:/brick2
> Brick3: psanaoss214:/brick3
> Brick4: psanaoss214:/brick4
> Brick5: psanaoss214:/brick5
> Brick6: psanaoss214:/brick6
> Brick7: psanaoss214:/brick7
> Brick8: psanaoss214:/brick8
> Brick9: psanaoss211:/brick1
> Brick10: psanaoss211:/brick2
> Brick11: psanaoss211:/brick3
> Brick12: psanaoss211:/brick4
> Brick13: psanaoss211:/brick5
> Brick14: psanaoss211:/brick6
> Brick15: psanaoss211:/brick7
> Brick16: psanaoss211:/brick8
> Brick17: psanaoss212:/brick1
> Brick18: psanaoss212:/brick2
> Brick19: psanaoss212:/brick3
> Brick20: psanaoss212:/brick4
> Brick21: psanaoss212:/brick5
> Brick22: psanaoss212:/brick6
> Brick23: psanaoss212:/brick7
> Brick24: psanaoss212:/brick8
> Brick25: psanaoss213:/brick1
> Brick26: psanaoss213:/brick2
> Brick27: psanaoss213:/brick3
> Brick28: psanaoss213:/brick4
> Brick29: psanaoss213:/brick5
> Brick30: psanaoss213:/brick6
> Brick31: psanaoss213:/brick7
> Brick32: psanaoss213:/brick8
> Brick33: psanaoss215:/brick1
> Brick34: psanaoss215:/brick2
> Brick35: psanaoss215:/brick4
> Brick36: psanaoss215:/brick5
> Brick37: psanaoss215:/brick7
> Brick38: psanaoss215:/brick8
> Brick39: psanaoss215:/brick3
> Brick40: psanaoss215:/brick6
> Options Reconfigured:
> performance.io-thread-count: 16
> performance.write-behind-**window-size: 16MB
> performance.cache-size: 1GB
> nfs.disable: on
> performance.cache-refresh-**timeout: 1
> network.ping-timeout: 42
> performance.cache-max-file-**size: 1PB
>
> ______________________________**_________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120608/9cff8bc3/attachment.html>