Song gluster at 163.com
Thu Jan 31 09:47:03 UTC 2013



I test it again, dump related glusterfs info and create a bug report on bugzilla.



I use "kill -USR1 <hanged glusterfs client process ID>" to dump info and find that "gfs28-replicate-5" maybe be hanged. Then, I dump glusterfsd info of "Brick16:" and find the "/xmail/disk2/gfs28/songcl/b83/003.txt" is opened two times by "ls -asl /proce/pid/fd" command. 


Maybe this file is deadlocked according to corresponding glusterfsd log:

[2013-01-31 13:42:20.927077] T [rpcsvc.c:187:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS 3.2.7 - INODELK

[2013-01-31 13:42:20.927090] T [server-resolve.c:127:resolve_loc_touchup] 0-gfs28-server: return value inode_path 11

[2013-01-31 13:42:20.927104] T [common.c:103:get_domain] 0-posix-locks: Domain gfs28-replicate-5 found

[2013-01-31 13:42:20.927113] T [inodelk.c:218:__lock_inodelk] 0-gfs28-locks: Lock (pid=1059928640) lk-owner:140197382404672 9223372036854775806 - 0 => Blocked

[2013-01-31 13:42:20.927123] T [inodelk.c:486:pl_inode_setlk] 0-gfs28-locks: Lock (pid=1059928640) (lk-owner=140197382404672) 9223372036854775806 - 0 => NOK

[2013-01-31 13:42:20.927132] T [inodelk.c:218:__lock_inodelk] 0-gfs28-locks: Lock (pid=1059928640) lk-owner:140197382404672 9223372036854775806 - 0 => Blocked

[2013-01-31 13:42:20.933429] T [rpcsvc.c:443:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 987


For more information, please refer to attachment.


1. PID:6988 is the hanged glusterfs client dump file.

2. PID:31100 is the glusterfsd dump file of "Brick16".

3. 188-xmail-disk2-gfs28.log.splitab is the glusterfsd log of "Brick16".


If you need any other debug information, please tell me. 

Thanks very much!


From: Joe Julian [mailto:joe at julianfamily.org] 
Sent: Friday, January 25, 2013 12:15 AM
To: Song; gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] glusterfs(3.2.7) hang when making the same dir at the same time


This looks like a support question to me. If you are asking a development question, you might want to use strace or gdb to figure out where the hang is, file a bug report on bugzilla, and submit your patch(es) to gerrit. 

Song <gluster at 163.com> wrote:



Recently, glusterfs will hang when we do stress testing. To find the reason, we write a test shell script.


We run the test shell on 5 servers at the same time. For a moment, all test programming is hang.

When execute command “cd /xmail/gfs1/scl_test/001”, also hang.


The test shell script:




rmdir /xmail/gfs1/scl_test/001

if [ "$?" == "0" ];


echo "delete dir success"



mkdir /xmail/gfs1/scl_test/001

if [ "$?" == "0" ];


echo "create dir success"



echo "1111" >>/xmail/gfs1/scl_test/001/001.txt

echo "2222" >>/xmail/gfs1/scl_test/001/002.txt

echo "3333" >>/xmail/gfs1/scl_test/001/003.txt


rm -rf /xmail/gfs1/scl_test/001/001.txt

rm -ff /xmail/gfs1/scl_test/001/002.txt

rm -rf /xmail/gfs1/scl_test/001/003.txt



“/xmail/gfs1” is native mount point of gluster volume gfs1.


Gluster volume info is as below:

[root at d181 glusterfs]# gluster volume info


Volume Name: gfs1

Type: Distributed-Replicate

Status: Started

Number of Bricks: 30 x 3 = 90

Transport-type: tcp



Please help me, Thanks!



