[Gluster-users] Brick crashes
Anand Avati
anand.avati at gmail.com
Fri Jun 8 23:34:12 UTC 2012
Is it possible the system was running low on memory? I see you have 48GB,
but memory registration failure typically would be because the system limit
on the number of pinnable pages in RAM was hit. Can you tell us the size of
your core dump files after the crash?
Avati
On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho <ling at slac.stanford.edu> wrote:
> Hello,
>
> I have a brick that crashed twice today, and another different brick that
> crashed just a while a go.
>
> This is what I see in one of the brick logs:
>
> patchset: git://git.gluster.com/**glusterfs.git<http://git.gluster.com/glusterfs.git>
> patchset: git://git.gluster.com/**glusterfs.git<http://git.gluster.com/glusterfs.git>
> signal received: 6
> signal received: 6
> time of crash: 2012-06-08 15:05:11
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.2.6
> /lib64/libc.so.6[0x34bc032900]
> /lib64/libc.so.6(gsignal+0x35)**[0x34bc032885]
> /lib64/libc.so.6(abort+0x175)[**0x34bc034065]
> /lib64/libc.so.6[0x34bc06f977]
> /lib64/libc.so.6[0x34bc075296]
> /opt/glusterfs/3.2.6/lib64/**libglusterfs.so.0(__gf_free+**
> 0x44)[0x7f1740ba25e4]
> /opt/glusterfs/3.2.6/lib64/**libgfrpc.so.0(rpc_transport_**
> destroy+0x47)[0x7f1740956967]
> /opt/glusterfs/3.2.6/lib64/**libgfrpc.so.0(rpc_transport_**
> unref+0x62)[0x7f1740956a32]
> /opt/glusterfs/3.2.6/lib64/**glusterfs/3.2.6/rpc-transport/**
> rdma.so(+0xc135)[**0x7f173ca27135]
> /lib64/libpthread.so.0[**0x34bc8077f1]
> /lib64/libc.so.6(clone+0x6d)[**0x34bc0e5ccd]
> ---------
>
> And somewhere before these, there is also
> [2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post]
> 0-rpc-transport/rdma: memory registration failed
>
> I have 48GB of memory on the system:
>
> # free
> total used free shared buffers cached
> Mem: 49416716 34496648 14920068 0 31692 28209612
> -/+ buffers/cache: 6255344 43161372
> Swap: 4194296 1740 4192556
>
> # uname -a
> Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22 EST
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> The server gluster versions is 3.2.6-1. I am using have both rdma clients
> and tcp clients over 10Gb/s network.
>
> Any suggestion what I should look for?
>
> Is there a way to just restart the brick, and not glusterd on the server?
> I have 8 bricks on the server.
>
> Thanks,
> ...
> ling
>
>
> Here's the volume info:
>
> # gluster volume info
>
> Volume Name: ana12
> Type: Distribute
> Status: Started
> Number of Bricks: 40
> Transport-type: tcp,rdma
> Bricks:
> Brick1: psanaoss214:/brick1
> Brick2: psanaoss214:/brick2
> Brick3: psanaoss214:/brick3
> Brick4: psanaoss214:/brick4
> Brick5: psanaoss214:/brick5
> Brick6: psanaoss214:/brick6
> Brick7: psanaoss214:/brick7
> Brick8: psanaoss214:/brick8
> Brick9: psanaoss211:/brick1
> Brick10: psanaoss211:/brick2
> Brick11: psanaoss211:/brick3
> Brick12: psanaoss211:/brick4
> Brick13: psanaoss211:/brick5
> Brick14: psanaoss211:/brick6
> Brick15: psanaoss211:/brick7
> Brick16: psanaoss211:/brick8
> Brick17: psanaoss212:/brick1
> Brick18: psanaoss212:/brick2
> Brick19: psanaoss212:/brick3
> Brick20: psanaoss212:/brick4
> Brick21: psanaoss212:/brick5
> Brick22: psanaoss212:/brick6
> Brick23: psanaoss212:/brick7
> Brick24: psanaoss212:/brick8
> Brick25: psanaoss213:/brick1
> Brick26: psanaoss213:/brick2
> Brick27: psanaoss213:/brick3
> Brick28: psanaoss213:/brick4
> Brick29: psanaoss213:/brick5
> Brick30: psanaoss213:/brick6
> Brick31: psanaoss213:/brick7
> Brick32: psanaoss213:/brick8
> Brick33: psanaoss215:/brick1
> Brick34: psanaoss215:/brick2
> Brick35: psanaoss215:/brick4
> Brick36: psanaoss215:/brick5
> Brick37: psanaoss215:/brick7
> Brick38: psanaoss215:/brick8
> Brick39: psanaoss215:/brick3
> Brick40: psanaoss215:/brick6
> Options Reconfigured:
> performance.io-thread-count: 16
> performance.write-behind-**window-size: 16MB
> performance.cache-size: 1GB
> nfs.disable: on
> performance.cache-refresh-**timeout: 1
> network.ping-timeout: 42
> performance.cache-max-file-**size: 1PB
>
> ______________________________**_________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120608/9cff8bc3/attachment.html>
More information about the Gluster-users
mailing list