[Bugs] [Bug 1178619] New: Statfs is hung because of frame loss in quota

bugzilla at redhat.com bugzilla at redhat.com
Mon Jan 5 06:50:32 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1178619

            Bug ID: 1178619
           Summary: Statfs is hung because of frame loss in quota
           Product: GlusterFS
           Version: mainline
         Component: quota
          Assignee: bugs at gluster.org
          Reporter: rgowdapp at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:
Rebalance process is hung in statfs call of quota and fails after time out
###################################################################
1. crated a 6x2 dist-rep volume
2. Ran ACA script which does deep directory creation and renaming of
directories and files
3. while script is running did add-brick and rebalance

Result:
Rebalance will be hung for 1800 seconds which is call bail timeout then
it runs to completion


statedump:
--------------
[global.callpool.stack.1.frame.1]
ref_count=1
translator=test-server
complete=0

[global.callpool.stack.1.frame.2]
ref_count=0
translator=test-quota
complete=0
parent=/brick2/test7
wind_from=io_stats_statfs
wind_to=FIRST_CHILD(this)->fops->statfs
unwind_to=io_stats_statfs_cbk

[global.callpool.stack.1.frame.3]
ref_count=1
translator=/brick2/test7
complete=0
parent=test-server
wind_from=server_statfs_resume
wind_to=bound_xl->fops->statfs
unwind_to=server_statfs_cbk


>From rebalance logs
===========
[2015-01-03 14:49:59.065353] E [rpc-clnt.c:201:call_bail]
0-test-client-1: bailing out frame type(GlusterFS 3.3) op(STATFS(14)) xid =
0x794 sent = 2015-01-03 14:19:58.397959. timeout = 1800 for
10.70.44.70:49152

Version-Release number of selected component (if applicable):


How reproducible:
When building ancestry fails, it results in frame loss as error is not handled
properly. We saw an error log in brick process which said open failed on the
same gfid (on which statfs was issued). This open most likely would've been
issued as part of Ancestry building code in quota.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list