[Gluster-devel] glusterfs(3.2.7) hang when making the same dir	at the same time
    Song 
    gluster at 163.com
       
    Fri Feb  1 06:37:55 UTC 2013
    
    
  
Most bricks are normal, the output is as following:
 
[root at h228 ~]# getfattr -d -e hex -m . /xmail/disk1/gfs28/songcl/b83
getfattr: Removing leading '/' from absolute path names
# file: xmail/disk1/gfs28/songcl/b83
trusted.afr.gfs28-client-0=0x000000000000000000000000
trusted.afr.gfs28-client-1=0x000000000000000000000000
trusted.afr.gfs28-client-2=0x000000000000000000000000
trusted.gfid=0xb101a244cfaf4addaea8b7b031423e89
trusted.glusterfs.dht=0x0000000100000000f7777768ffffffff
 
But, the directory “/xmail/disk1/gfs28/songcl/b83” isn’t existed in two replicates.
 
By the way, what is the reason such as warning “[2013-01-10 05:46:16.162509] W [dht-common.c:178:dht_lookup_dir_cbk] 0-gfs1-dht: /xmail_dedup/gfs1_000/001/05D: gfid different on gfs1-replicate-87”?
Is it the dir “/xmail_dedup/gfs1_000/001/05D” is created on various glusterfs clients at the same time?
Because gfid is only generated on “fuse_mkdir” function in Fuse-bridge.c.
 
Thanks.
 
From: Anand Avati [mailto:anand.avati at gmail.com] 
Sent: Friday, February 01, 2013 4:19 AM
To: Song
Cc: Joe Julian; gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] glusterfs(3.2.7) hang when making the same dir at the same time
 
Can you also give the outputs of "getfattr -d -m . -e hex /backend/dir" from each of the bricks? It will be interesting to know in case there was a gfid mismatch somehow.
 
Avati
On Thu, Jan 31, 2013 at 1:47 AM, Song <gluster at 163.com> wrote:
Joe,
 
I test it again, dump related glusterfs info and create a bug report on bugzilla.
https://bugzilla.redhat.com/show_bug.cgi?id=906238
 
I use "kill -USR1 <hanged glusterfs client process ID>" to dump info and find that "gfs28-replicate-5" maybe be hanged. Then, I dump glusterfsd info of "Brick16: 10.1.10.188:/xmail/disk2/gfs28" and find the "/xmail/disk2/gfs28/songcl/b83/003.txt" is opened two times by "ls -asl /proce/pid/fd" command. 
 
Maybe this file is deadlocked according to corresponding glusterfsd log:
[2013-01-31 13:42:20.927077] T [rpcsvc.c:187:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS 3.2.7 - INODELK
[2013-01-31 13:42:20.927090] T [server-resolve.c:127:resolve_loc_touchup] 0-gfs28-server: return value inode_path 11
[2013-01-31 13:42:20.927104] T [common.c:103:get_domain] 0-posix-locks: Domain gfs28-replicate-5 found
[2013-01-31 13:42:20.927113] T [inodelk.c:218:__lock_inodelk] 0-gfs28-locks: Lock (pid=1059928640) lk-owner:140197382404672 9223372036854775806 - 0 => Blocked
[2013-01-31 13:42:20.927123] T [inodelk.c:486:pl_inode_setlk] 0-gfs28-locks: Lock (pid=1059928640) (lk-owner=140197382404672) 9223372036854775806 - 0 => NOK
[2013-01-31 13:42:20.927132] T [inodelk.c:218:__lock_inodelk] 0-gfs28-locks: Lock (pid=1059928640) lk-owner:140197382404672 9223372036854775806 - 0 => Blocked
[2013-01-31 13:42:20.933429] T [rpcsvc.c:443:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 987
 
For more information, please refer to attachment.
 
1. PID:6988 is the hanged glusterfs client dump file.
2. PID:31100 is the glusterfsd dump file of "Brick16".
3. 188-xmail-disk2-gfs28.log.splitab is the glusterfsd log of "Brick16".
 
If you need any other debug information, please tell me. 
Thanks very much!
 
From: Joe Julian [mailto:joe at julianfamily.org] 
Sent: Friday, January 25, 2013 12:15 AM
To: Song; gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] glusterfs(3.2.7) hang when making the same dir at the same time
 
This looks like a support question to me. If you are asking a development question, you might want to use strace or gdb to figure out where the hang is, file a bug report on bugzilla, and submit your patch(es) to gerrit. 
Song <gluster at 163.com> wrote:
Hi,
 
Recently, glusterfs will hang when we do stress testing. To find the reason, we write a test shell script.
 
We run the test shell on 5 servers at the same time. For a moment, all test programming is hang.
When execute command “cd /xmail/gfs1/scl_test/001”, also hang.
 
The test shell script:
 
for((i=1;i<=100;i++));
do 
rmdir /xmail/gfs1/scl_test/001
if [ "$?" == "0" ];
then 
echo "delete dir success"
fi 
 
mkdir /xmail/gfs1/scl_test/001
if [ "$?" == "0" ];
then 
echo "create dir success"
fi
 
echo "1111" >>/xmail/gfs1/scl_test/001/001.txt
echo "2222" >>/xmail/gfs1/scl_test/001/002.txt
echo "3333" >>/xmail/gfs1/scl_test/001/003.txt
 
rm -rf /xmail/gfs1/scl_test/001/001.txt
rm -ff /xmail/gfs1/scl_test/001/002.txt
rm -rf /xmail/gfs1/scl_test/001/003.txt
done
 
“/xmail/gfs1” is native mount point of gluster volume gfs1.
 
Gluster volume info is as below:
[root at d181 glusterfs]# gluster volume info
 
Volume Name: gfs1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 30 x 3 = 90
Transport-type: tcp
 
 
Please help me, Thanks!
 
  _____  
Gluster-devel mailing list
Gluster-devel at nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel at nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130201/cdeb3ffb/attachment-0001.html>
    
    
More information about the Gluster-devel
mailing list