[Gluster-users] Brick crashes
Ling Ho
ling at slac.stanford.edu
Fri Jun 8 23:41:12 UTC 2012
This is the core file from the crash just now
[root at psanaoss213 /]# ls -al core*
-rw------- 1 root root 4073594880 Jun 8 15:05 core.22682
From yesterday:
[root at psanaoss214 /]# ls -al core*
-rw------- 1 root root 4362727424 Jun 8 00:58 core.13483
-rw------- 1 root root 4624773120 Jun 8 03:21 core.8792
On 06/08/2012 04:34 PM, Anand Avati wrote:
> Is it possible the system was running low on memory? I see you have
> 48GB, but memory registration failure typically would be because the
> system limit on the number of pinnable pages in RAM was hit. Can you
> tell us the size of your core dump files after the crash?
>
> Avati
>
> On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho <ling at slac.stanford.edu
> <mailto:ling at slac.stanford.edu>> wrote:
>
> Hello,
>
> I have a brick that crashed twice today, and another different
> brick that crashed just a while a go.
>
> This is what I see in one of the brick logs:
>
> patchset: git://git.gluster.com/glusterfs.git
> <http://git.gluster.com/glusterfs.git>
> patchset: git://git.gluster.com/glusterfs.git
> <http://git.gluster.com/glusterfs.git>
> signal received: 6
> signal received: 6
> time of crash: 2012-06-08 15:05:11
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.2.6
> /lib64/libc.so.6[0x34bc032900]
> /lib64/libc.so.6(gsignal+0x35)[0x34bc032885]
> /lib64/libc.so.6(abort+0x175)[0x34bc034065]
> /lib64/libc.so.6[0x34bc06f977]
> /lib64/libc.so.6[0x34bc075296]
> /opt/glusterfs/3.2.6/lib64/libglusterfs.so.0(__gf_free+0x44)[0x7f1740ba25e4]
> /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_destroy+0x47)[0x7f1740956967]
> /opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_unref+0x62)[0x7f1740956a32]
> /opt/glusterfs/3.2.6/lib64/glusterfs/3.2.6/rpc-transport/rdma.so(+0xc135)[0x7f173ca27135]
> /lib64/libpthread.so.0[0x34bc8077f1]
> /lib64/libc.so.6(clone+0x6d)[0x34bc0e5ccd]
> ---------
>
> And somewhere before these, there is also
> [2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post]
> 0-rpc-transport/rdma: memory registration failed
>
> I have 48GB of memory on the system:
>
> # free
> total used free shared buffers
> cached
> Mem: 49416716 34496648 14920068 0 31692
> 28209612
> -/+ buffers/cache: 6255344 43161372
> Swap: 4194296 1740 4192556 <tel:1740%20%20%20%204192556>
>
> # uname -a
> Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10
> 15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> The server gluster versions is 3.2.6-1. I am using have both rdma
> clients and tcp clients over 10Gb/s network.
>
> Any suggestion what I should look for?
>
> Is there a way to just restart the brick, and not glusterd on the
> server? I have 8 bricks on the server.
>
> Thanks,
> ...
> ling
>
>
> Here's the volume info:
>
> # gluster volume info
>
> Volume Name: ana12
> Type: Distribute
> Status: Started
> Number of Bricks: 40
> Transport-type: tcp,rdma
> Bricks:
> Brick1: psanaoss214:/brick1
> Brick2: psanaoss214:/brick2
> Brick3: psanaoss214:/brick3
> Brick4: psanaoss214:/brick4
> Brick5: psanaoss214:/brick5
> Brick6: psanaoss214:/brick6
> Brick7: psanaoss214:/brick7
> Brick8: psanaoss214:/brick8
> Brick9: psanaoss211:/brick1
> Brick10: psanaoss211:/brick2
> Brick11: psanaoss211:/brick3
> Brick12: psanaoss211:/brick4
> Brick13: psanaoss211:/brick5
> Brick14: psanaoss211:/brick6
> Brick15: psanaoss211:/brick7
> Brick16: psanaoss211:/brick8
> Brick17: psanaoss212:/brick1
> Brick18: psanaoss212:/brick2
> Brick19: psanaoss212:/brick3
> Brick20: psanaoss212:/brick4
> Brick21: psanaoss212:/brick5
> Brick22: psanaoss212:/brick6
> Brick23: psanaoss212:/brick7
> Brick24: psanaoss212:/brick8
> Brick25: psanaoss213:/brick1
> Brick26: psanaoss213:/brick2
> Brick27: psanaoss213:/brick3
> Brick28: psanaoss213:/brick4
> Brick29: psanaoss213:/brick5
> Brick30: psanaoss213:/brick6
> Brick31: psanaoss213:/brick7
> Brick32: psanaoss213:/brick8
> Brick33: psanaoss215:/brick1
> Brick34: psanaoss215:/brick2
> Brick35: psanaoss215:/brick4
> Brick36: psanaoss215:/brick5
> Brick37: psanaoss215:/brick7
> Brick38: psanaoss215:/brick8
> Brick39: psanaoss215:/brick3
> Brick40: psanaoss215:/brick6
> Options Reconfigured:
> performance.io-thread-count: 16
> performance.write-behind-window-size: 16MB
> performance.cache-size: 1GB
> nfs.disable: on
> performance.cache-refresh-timeout: 1
> network.ping-timeout: 42
> performance.cache-max-file-size: 1PB
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120608/9562d07f/attachment.html>
More information about the Gluster-users
mailing list