[Bugs] [Bug 1209340] Random regression test hang : bug-1113960.t

bugzilla at redhat.com bugzilla at redhat.com
Tue Apr 7 06:31:36 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1209340



--- Comment #1 from Nithya Balachandran <nbalacha at redhat.com> ---
The test renames multiple directories in a deep directory structure from one
FUSE mount while simultaneous lookup requests are sent from another FUSE mount.

Initial analysis on a hung system using gdb and logs reveals the following:


1. The hung process is constantly calling __foreach_ancestor_dentry()
2. The directory rename has failed some directories on one subvolume
3. This seems to lead to a situation where a single directory has 2 dentries -
one each for the old and new names.
4. The inode_link() function does a cycle check by calling
__foreach_ancestor_dentry()
4. __foreach_ancestor_dentry() walks up the directory tree for each dentry it
finds in the parent dentry list. This means that there will be 2^(level at
which duplicate dentry exists) calls for each cycle check. 

This causes inode_link to hang as it takes a very long time to finish the cycle
check. 

In the process, olddir12 rename failed on patchy-client-3
>From the logs:

24363 ?        S      0:00 mv
/mnt/glusterfs/0/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12/longernamedir13/longernamedir14/longernamedir15/longernamedir16/longernamedir17/longernamedir18/longernamedir19/longernamedir20/longernamedir21/longernamedir22/longernamedir23/longernamedir24/longernamedir25/longernamedir26/longernamedir27/longernamedir28/longernamedir29/longernamedir30/longernamedir31/longernamedir32/longernamedir33/longernamedir34/longernamedir35/longernamedir36/longernamedir37/longernamedir38/longernamedir39/longernamedir40/longernamedir41/longernamedir42/longernamedir43/longernamedir44/longernamedir45/file0
/mnt/glusterfs/0/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12/longernamedir13/longernamedir14/longernamedir15/longernamedir16/longernamedir17/longernamedir18/longernamedir19/longernamedir20/longernamedir21/longernamedir22/longernamedir23/longernamedir24/longernamedir25/longernamedir26/longernamedir27/longernamedir28/longernamedir29/longernamedir30/longernamedir31/longernamedir32/longernamedir33/longernamedir34/longernamedir35/longernamedir36/longernamedir37/longernamedir38/longernamedir39/longernamedir40/longernamedir41/longernamedir42/longernamedir43/longernamedir44/longernamedir45/newfile0



[root at bulk5 ~]# /build/install/sbin/gluster v info

Volume Name: patchy
Type: Distribute
Volume ID: 213b4118-a3e6-4732-9d59-487b47cfda94
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: bulk5.rack.gluster.org:/d/backends/patchy1
Brick2: bulk5.rack.gluster.org:/d/backends/patchy2
Brick3: bulk5.rack.gluster.org:/d/backends/patchy3
Brick4: bulk5.rack.gluster.org:/d/backends/patchy4
[root at bulk5 ~]# 





[2015-03-30 20:20:54.753847] I [MSGID: 109036]
[dht-common.c:6407:dht_log_new_layout_for_dir_selfheal] 0-patchy-dht: Setting
layout of /olddir11 with [Subvol_name: patchy-client-0, Err: -1 , Start:
1073733380 , Stop: 2147466759 ], [Subvol_name: patchy-client-1, Err: -1 ,
Start: 2147466760 , Stop: 3221200139 ], [Subvol_name: patchy-client-2, Err: -1
, Start: 3221200140 , Stop: 4294967295 ], [Subvol_name: patchy-client-3, Err:
-1 , Start: 0 , Stop: 1073733379 ],




[2015-03-30 20:21:01.442079] I [MSGID: 109018]
[dht-common.c:772:dht_revalidate_cbk] 0-patchy-dht: Mismatching layouts for
/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12,
gfid = 00000000-0000-0000-0000-000000000000
[2015-03-30 20:21:01.442117] I [dht-layout.c:800:dht_layout_dir_mismatch]
0-patchy-dht: subvol: patchy-client-1; inode layout - 3221200140 - 4294967295;
disk layout - 2863302660 - 4294967295
[2015-03-30 20:21:01.442288] I [dht-layout.c:800:dht_layout_dir_mismatch]
0-patchy-dht: subvol: patchy-client-2; inode layout - 0 - 1073733379; disk
layout - 0 - 1431651329
[2015-03-30 20:21:01.456272] I [dht-rename.c:1344:dht_rename] 0-patchy-dht:
renaming
/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12
(hash=patchy-client-0/cache=patchy-client-3) =>
/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12
(hash=patchy-client-2/cache=<nul>)
[2015-03-30 20:21:01.458541] W [client-rpc-fops.c:2599:client3_3_rename_cbk]
0-patchy-client-3: remote operation failed: No such file or directory
The message "I [MSGID: 109018] [dht-common.c:772:dht_revalidate_cbk]
0-patchy-dht: Mismatching layouts for
/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12,
gfid = 00000000-0000-0000-0000-000000000000" repeated 2 times between
[2015-03-30 20:21:01.442079] and [2015-03-30 20:21:01.442305]
[2015-03-30 20:21:01.458567] I [MSGID: 109030]
[dht-rename.c:49:dht_rename_dir_cbk] 0-patchy-dht: Rename
/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12
->
/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12
on patchy-client-3 failed, (gfid = e2b8a2b5-e60a-4bb2-9593-6417982e3ca9) [No
such file or directory]
[2015-03-30 20:21:01.458637] W [fuse-bridge.c:1756:fuse_rename_cbk]
0-glusterfs-fuse: 16792:
/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12
->
/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12
=> -1 (No such file or directory)





I need to try to reproduce this on local systems.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list