[Bugs] [Bug 1269501] New: Self-heal daemon crashes when bricks godown at the time of data heal

Wed Oct 7 12:39:39 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1269501

            Bug ID: 1269501
           Summary: Self-heal daemon crashes when bricks godown at the
                    time of data heal
           Product: GlusterFS
           Version: 3.7.5
         Component: replicate
          Assignee: pkarampu at redhat.com
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com
        Depends On: 1269470

+++ This bug was initially created as a clone of Bug #1269470 +++

Description of problem:
When all the bricks go down at the time of data self-heal. Self-heal daemon
process is crashing with following bt:
(gdb) bt
#0  0x00007fae978ccb0f in afr_local_replies_wipe (local=0x0,
priv=0x7fae900125b0) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:1241
#1  0x00007fae978b7aaf in afr_selfheal_inodelk (frame=0x7fae8c000c0c,
this=0x7fae9000a6d0, inode=0x7fae8c00609c, dom=0x7fae900099f0
"patchy-replicate-0", off=8126464, size=131072, locked_on=0x7fae96b4f110 "")
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:879
#2  0x00007fae978bbeb5 in afr_selfheal_data_block (frame=0x7fae8c000c0c,
this=0x7fae9000a6d0, fd=0x7fae8c006e6c, source=0, healed_sinks=0x7fae96b4f8a0
"", offset=8126464, size=131072, type=1, 
    replies=0x7fae96b4f2b0) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:243
#3  0x00007fae978bc91d in afr_selfheal_data_do (frame=0x7fae8c006c9c,
this=0x7fae9000a6d0, fd=0x7fae8c006e6c, source=0, healed_sinks=0x7fae96b4f8a0
"", replies=0x7fae96b4f2b0)
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:365
#4  0x00007fae978bdc7b in __afr_selfheal_data (frame=0x7fae8c006c9c,
this=0x7fae9000a6d0, fd=0x7fae8c006e6c, locked_on=0x7fae96b4fa00
"\001\001\240")
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:719
#5  0x00007fae978be0a0 in afr_selfheal_data (frame=0x7fae8c006c9c,
this=0x7fae9000a6d0, inode=0x7fae8c00609c)
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:808
#6  0x00007fae978ba4d7 in afr_selfheal_do (frame=0x7fae8c006c9c,
this=0x7fae9000a6d0, gfid=0x7fae96b4fc30
"s\303\315$w\244M\026\205`\226\336\263\205\300qЦ")
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:1335
#7  0x00007fae978ba613 in afr_selfheal (this=0x7fae9000a6d0,
gfid=0x7fae96b4fc30 "s\303\315$w\244M\026\205`\226\336\263\205\300qЦ")
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:1380
#8  0x00007fae978c3e20 in afr_shd_selfheal (healer=0x7fae90013130, child=0,
gfid=0x7fae96b4fc30 "s\303\315$w\244M\026\205`\226\336\263\205\300qЦ")
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:326
#9  0x00007fae978c4142 in afr_shd_index_heal (subvol=0x7fae90006e50,
entry=0x7fae90002900, parent=0x7fae96b4fdd0, data=0x7fae90013130)
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:416
#10 0x00007faea482aa83 in syncop_dir_scan (subvol=0x7fae90006e50,
loc=0x7fae96b4fdd0, pid=-6, data=0x7fae90013130, fn=0x7fae978c4034
<afr_shd_index_heal>)
    at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop-utils.c:262
#11 0x00007fae978c42bb in afr_shd_index_sweep (healer=0x7fae90013130) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:450
#12 0x00007fae978c4553 in afr_shd_index_healer (data=0x7fae90013130) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:518
#13 0x00007faea3a90a51 in start_thread () from ./lib64/libpthread.so.0
#14 0x00007faea33fa93d in clone () from ./lib64/libc.so.6
(gdb) p local->child_up[0]
No symbol "local" in current context.
(gdb) p priv->child_up[0]
$8 = 0 '\000'
(gdb) p priv->child_up[1]
$9 = 0 '\000'

AFR_STACK_RESET() can fail to create local when the bricks are all down, which
leads to the crash.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1269470
[Bug 1269470] Self-heal daemon crashes when bricks godown at the time of
data heal
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=AnXCjZoJ1b&a=cc_unsubscribe