[Gluster-users] Brick crashes
Pranith Kumar Karampuri
pkarampu at redhat.com
Sat Jun 9 05:58:01 UTC 2012
hi Ling Ho,
It seems like you are using rdma, could you confirm?
I am suspecting a memory leak. Could you help me confirm if that is the case.
Please post the output of the following:
1) when you start the brick perform 'kill -USR1 <pid-of-brick>' This will save a file /tmp/glusterdump.<pid-of-brick>
2) mv /tmp/glusterdump.<pid-of-brick> /tmp/glusterdump.<pid-of-brick>.pre
3) Run the brick for a while and observe 'top -p <pid-of-brick>' to see if 'RES' filed is increasing. After it increases by 1G or so
do one more 'kill -USR1 <pid-of-brick>' Now attach the outputs of both /tmp/glusterdump.<pid-of-brick> /tmp/glusterdump.<pid-of-brick>.pre
to this mail.
Do let us know the operations that are performed on the system to re-create this case in our test labs.
Pranith.
----- Original Message -----
From: "Ling Ho" <ling at slac.stanford.edu>
To: "Anand Avati" <anand.avati at gmail.com>
Cc: Gluster-users at gluster.org
Sent: Saturday, June 9, 2012 5:11:12 AM
Subject: Re: [Gluster-users] Brick crashes
This is the core file from the crash just now
[root at psanaoss213 /]# ls -al core*
-rw------- 1 root root 4073594880 Jun 8 15:05 core.22682
>From yesterday:
[root at psanaoss214 /]# ls -al core*
-rw------- 1 root root 4362727424 Jun 8 00:58 core.13483
-rw------- 1 root root 4624773120 Jun 8 03:21 core.8792
On 06/08/2012 04:34 PM, Anand Avati wrote:
Is it possible the system was running low on memory? I see you have 48GB, but memory registration failure typically would be because the system limit on the number of pinnable pages in RAM was hit. Can you tell us the size of your core dump files after the crash?
Avati
On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho < ling at slac.stanford.edu > wrote:
Hello,
I have a brick that crashed twice today, and another different brick that crashed just a while a go.
This is what I see in one of the brick logs:
patchset: git:// git.gluster.com/glusterfs.git
patchset: git:// git.gluster.com/glusterfs.git
signal received: 6
signal received: 6
time of crash: 2012-06-08 15:05:11
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.2.6
/lib64/libc.so.6[0x34bc032900]
/lib64/libc.so.6(gsignal+0x35)[0x34bc032885]
/lib64/libc.so.6(abort+0x175)[0x34bc034065]
/lib64/libc.so.6[0x34bc06f977]
/lib64/libc.so.6[0x34bc075296]
/opt/glusterfs/3.2.6/lib64/libglusterfs.so.0(__gf_free+0x44)[0x7f1740ba25e4]
/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_destroy+0x47)[0x7f1740956967]
/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_transport_unref+0x62)[0x7f1740956a32]
/opt/glusterfs/3.2.6/lib64/glusterfs/3.2.6/rpc-transport/rdma.so(+0xc135)[0x7f173ca27135]
/lib64/libpthread.so.0[0x34bc8077f1]
/lib64/libc.so.6(clone+0x6d)[0x34bc0e5ccd]
---------
And somewhere before these, there is also
[2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post] 0-rpc-transport/rdma: memory registration failed
I have 48GB of memory on the system:
# free
total used free shared buffers cached
Mem: 49416716 34496648 14920068 0 31692 28209612
-/+ buffers/cache: 6255344 43161372
Swap: 4194296 1740 4192556
# uname -a
Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
The server gluster versions is 3.2.6-1. I am using have both rdma clients and tcp clients over 10Gb/s network.
Any suggestion what I should look for?
Is there a way to just restart the brick, and not glusterd on the server? I have 8 bricks on the server.
Thanks,
...
ling
Here's the volume info:
# gluster volume info
Volume Name: ana12
Type: Distribute
Status: Started
Number of Bricks: 40
Transport-type: tcp,rdma
Bricks:
Brick1: psanaoss214:/brick1
Brick2: psanaoss214:/brick2
Brick3: psanaoss214:/brick3
Brick4: psanaoss214:/brick4
Brick5: psanaoss214:/brick5
Brick6: psanaoss214:/brick6
Brick7: psanaoss214:/brick7
Brick8: psanaoss214:/brick8
Brick9: psanaoss211:/brick1
Brick10: psanaoss211:/brick2
Brick11: psanaoss211:/brick3
Brick12: psanaoss211:/brick4
Brick13: psanaoss211:/brick5
Brick14: psanaoss211:/brick6
Brick15: psanaoss211:/brick7
Brick16: psanaoss211:/brick8
Brick17: psanaoss212:/brick1
Brick18: psanaoss212:/brick2
Brick19: psanaoss212:/brick3
Brick20: psanaoss212:/brick4
Brick21: psanaoss212:/brick5
Brick22: psanaoss212:/brick6
Brick23: psanaoss212:/brick7
Brick24: psanaoss212:/brick8
Brick25: psanaoss213:/brick1
Brick26: psanaoss213:/brick2
Brick27: psanaoss213:/brick3
Brick28: psanaoss213:/brick4
Brick29: psanaoss213:/brick5
Brick30: psanaoss213:/brick6
Brick31: psanaoss213:/brick7
Brick32: psanaoss213:/brick8
Brick33: psanaoss215:/brick1
Brick34: psanaoss215:/brick2
Brick35: psanaoss215:/brick4
Brick36: psanaoss215:/brick5
Brick37: psanaoss215:/brick7
Brick38: psanaoss215:/brick8
Brick39: psanaoss215:/brick3
Brick40: psanaoss215:/brick6
Options Reconfigured:
performance.io-thread-count: 16
performance.write-behind-window-size: 16MB
performance.cache-size: 1GB
nfs.disable: on
performance.cache-refresh-timeout: 1
network.ping-timeout: 42
performance.cache-max-file-size: 1PB
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list