[Gluster-devel] Regression health for release-5.next and release-6

Thu Jan 17 08:21:47 UTC 2019

On Thu, Jan 17, 2019 at 5:29 AM Atin Mukherjee <amukherj at redhat.com> wrote:

>
>
> On Tue, Jan 15, 2019 at 2:13 PM Atin Mukherjee <atin.mukherjee83 at gmail.com>
> wrote:
>
>> Interesting. I’ll do a deep dive at it sometime this week.
>>
>> On Tue, 15 Jan 2019 at 14:05, Xavi Hernandez <jahernan at redhat.com> wrote:
>>
>>> On Mon, Jan 14, 2019 at 11:08 AM Ashish Pandey <aspandey at redhat.com>
>>> wrote:
>>>
>>>>
>>>> I downloaded logs of regression runs 1077 and 1073 and tried to
>>>> investigate it.
>>>> In both regression ec/bug-1236065.t is hanging on TEST 70  which is
>>>> trying to get the online brick count
>>>>
>>>> I can see that in mount/bricks and glusterd logs it has not move
>>>> forward after this test.
>>>> glusterd.log  -
>>>>
>>>> [2019-01-06 16:27:51.346408]:++++++++++
>>>> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 70 5 online_brick_count
>>>> ++++++++++
>>>> [2019-01-06 16:27:51.645014] I [MSGID: 106499]
>>>> [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management:
>>>> Received status volume req for volume patchy
>>>> [2019-01-06 16:27:51.646664] I [dict.c:2745:dict_get_str_boolean]
>>>> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x4a6c3)
>>>> [0x7f4c37fe06c3]
>>>> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x43b3a)
>>>> [0x7f4c37fd9b3a]
>>>> -->/build/install/lib/libglusterfs.so.0(dict_get_str_boolean+0x170)
>>>> [0x7f4c433d83fb] ) 0-dict: key nfs.disable, integer type asked, has string
>>>> type [Invalid argument]
>>>> [2019-01-06 16:27:51.647177] I [dict.c:2361:dict_get_strn]
>>>> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32)
>>>> [0x7f4c38095a32]
>>>> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac)
>>>> [0x7f4c37fdd4ac]
>>>> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179)
>>>> [0x7f4c433d7673] ) 0-dict: key brick0.rdma_port, string type asked, has
>>>> integer type [Invalid argument]
>>>> [2019-01-06 16:27:51.647227] I [dict.c:2361:dict_get_strn]
>>>> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32)
>>>> [0x7f4c38095a32]
>>>> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac)
>>>> [0x7f4c37fdd4ac]
>>>> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179)
>>>> [0x7f4c433d7673] ) 0-dict: key brick1.rdma_port, string type asked, has
>>>> integer type [Invalid argument]
>>>> [2019-01-06 16:27:51.647292] I [dict.c:2361:dict_get_strn]
>>>> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32)
>>>> [0x7f4c38095a32]
>>>> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac)
>>>> [0x7f4c37fdd4ac]
>>>> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179)
>>>> [0x7f4c433d7673] ) 0-dict: key brick2.rdma_port, string type asked, has
>>>> integer type [Invalid argument]
>>>> [2019-01-06 16:27:51.647333] I [dict.c:2361:dict_get_strn]
>>>> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32)
>>>> [0x7f4c38095a32]
>>>> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac)
>>>> [0x7f4c37fdd4ac]
>>>> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179)
>>>> [0x7f4c433d7673] ) 0-dict: key brick3.rdma_port, string type asked, has
>>>> integer type [Invalid argument]
>>>> [2019-01-06 16:27:51.647371] I [dict.c:2361:dict_get_strn]
>>>> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32)
>>>> [0x7f4c38095a32]
>>>> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac)
>>>> [0x7f4c37fdd4ac]
>>>> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179)
>>>> [0x7f4c433d7673] ) 0-dict: key brick4.rdma_port, string type asked, has
>>>> integer type [Invalid argument]
>>>> [2019-01-06 16:27:51.647409] I [dict.c:2361:dict_get_strn]
>>>> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32)
>>>> [0x7f4c38095a32]
>>>> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac)
>>>> [0x7f4c37fdd4ac]
>>>> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179)
>>>> [0x7f4c433d7673] ) 0-dict: key brick5.rdma_port, string type asked, has
>>>> integer type [Invalid argument]
>>>> [2019-01-06 16:27:51.647447] I [dict.c:2361:dict_get_strn]
>>>> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32)
>>>> [0x7f4c38095a32]
>>>> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac)
>>>> [0x7f4c37fdd4ac]
>>>> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179)
>>>> [0x7f4c433d7673] ) 0-dict: key brick6.rdma_port, string type asked, has
>>>> integer type [Invalid argument]
>>>> [2019-01-06 16:27:51.649335] E [MSGID: 101191]
>>>> [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>> handler
>>>> [2019-01-06 16:27:51.932871] I [MSGID: 106499]
>>>> [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management:
>>>> Received status volume req for volume patchy
>>>>
>>>> It is just taking lot of time to get the status at this point.
>>>> It looks like there could be some issue with connection or the handing
>>>> of volume status when some bricks are down.
>>>>
>>>
>>> The 'online_brick_count' check uses 'gluster volume status' to get some
>>> information, and it does that several times (currently 7). Looking at
>>> cmd_history.log, I see that after the 'online_brick_count' at line 70, only
>>> one 'gluster volume status' has completed. Apparently the second 'gluster
>>> volume status' is hung.
>>>
>>> In cli.log I see that the second 'gluster volume status' seems to have
>>> started, but not finished:
>>>
>>> Normal run:
>>>
>>> [2019-01-08 16:36:43.628821] I [cli.c:834:main] 0-cli: Started running
>>> gluster with version 6dev
>>> [2019-01-08 16:36:43.808182] I [MSGID: 101190]
>>> [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 0
>>> [2019-01-08 16:36:43.808287] I [MSGID: 101190]
>>> [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 1
>>> [2019-01-08 16:36:43.808432] E [MSGID: 101191]
>>> [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>> handler
>>> [2019-01-08 16:36:43.816534] I [dict.c:1947:dict_get_uint32]
>>> (-->gluster(cli_cmd_process+0x1e4) [0x40db50]
>>> -->gluster(cli_cmd_volume_status_cbk+0x90) [0x415bec]
>>> -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176)
>>> [0x7fefe569456
>>> 9] ) 0-dict: key cmd, unsigned integer type asked, has integer type
>>> [Invalid argument]
>>> [2019-01-08 16:36:43.816716] I [dict.c:1947:dict_get_uint32]
>>> (-->gluster(cli_cmd_volume_status_cbk+0x1cb) [0x415d27]
>>> -->gluster(gf_cli_status_volume_all+0xc8) [0x42fa94]
>>> -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176) [0x7f
>>> efe5694569] ) 0-dict: key cmd, unsigned integer type asked, has integer
>>> type [Invalid argument]
>>> [2019-01-08 16:36:43.824437] I [input.c:31:cli_batch] 0-: Exiting with: 0
>>>
>>>
>>> Bad run:
>>>
>>> [2019-01-08 16:36:43.940361] I [cli.c:834:main] 0-cli: Started running
>>> gluster with version 6dev
>>> [2019-01-08 16:36:44.147364] I [MSGID: 101190]
>>> [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 0
>>> [2019-01-08 16:36:44.147477] I [MSGID: 101190]
>>> [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 1
>>> [2019-01-08 16:36:44.147583] E [MSGID: 101191]
>>> [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>> handler
>>>
>>>
>>> In glusterd.log it seems as if it hasn't received any status request. It
>>> looks like the cli has not even connected to glusterd.
>>>
>>
> Downloaded the logs for the recent failure from
> https://build.gluster.org/job/regression-test-with-multiplex/1092/ and
> based on the log scanning this is what I see:
>
> 1. The test executes with out any issues till line no 74 i.e. "TEST $CLI
> volume start $V0 force" and cli.log along with cmd_history.log confirm the
> same:
>
> cli.log
> ====
> [2019-01-16 16:28:46.871877]:++++++++++
> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 73 gluster --mode=script
> --wignore volume start patchy force ++++++++++
> [2019-01-16 16:28:46.980780] I [cli.c:834:main] 0-cli: Started running
> gluster with version 6dev
> [2019-01-16 16:28:47.185996] I [MSGID: 101190]
> [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 0
> [2019-01-16 16:28:47.186113] I [MSGID: 101190]
> [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-16 16:28:47.186234] E [MSGID: 101191]
> [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> [2019-01-16 16:28:49.223376] I
> [cli-rpc-ops.c:1448:gf_cli_start_volume_cbk] 0-cli: Received resp to start
> volume <=== successfully processed the callback
> [2019-01-16 16:28:49.223668] I [input.c:31:cli_batch] 0-: Exiting with: 0
>
> cmd_history.log
> ============
> [2019-01-16 16:28:49.220491]  : volume start patchy force : SUCCESS
>
> However, in both cli and cmd_history log files these are the last set of
> logs I see which indicates either the test script is completely paused.
> There's no possibility I see that cli receiving this command and dropping
> it completely as otherwise we should have atleast seen the "Started running
> gluster with version 6dev" and "Exiting with" log entries.
>
> I could manage to reproduce this once locally in my system and then when I
> ran command from another prompt, volume status and all other gluster basic
> commands go through. I also inspected the processes and I don't see any
> suspect of processes being hung.
>
> So the mystery continues and we need to see why the test script is not all
> moving forward.
>

An additional thing that could be interesting: in all cases I've seen this
test to hang, the next test shows an error during cleanup:

Aborting.

/mnt/nfs/1 could not be deleted, here are the left over items
drwxr-xr-x. 2 root root 6 Jan 16 16:41 /d/backends
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/0
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/1
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/2
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/3
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/nfs/0
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/nfs/1

Please correct the problem and try again.

This is a bit weird, since this only happens after having removed all these
directories with an 'rm -rf', and this command doesn't exit on the first
error, so at least some of these directories should have been removed, even
is the mount process is hung (all nfs mounts and fuse mounts 1, 2 and 3 are
not used by the test). The only explanation I have is that the cleanup
function is being executed twice concurrently (probably from two different
scripts). The first cleanup is blocked (or is taking a lot of time)
removing one of the directories. Meantime the other cleanup has completed
and recreated the directories, so when the first one finally finishes, it
finds all directories still there, writing the above messages. This would
also mean that something is not properly killed between tests. Not sure if
that's possible.

This could match with your findings, since some commands executed on the
second script could "unblock" whatever is blocked in the first one, causing
it to progress and show the final error.

Could this explain something ?

>
>
>>> Xavi
>>>
>>>
>>>> ---
>>>> Ashish
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> *From: *"Mohit Agrawal" <moagrawa at redhat.com>
>>>> *To: *"Shyam Ranganathan" <srangana at redhat.com>
>>>> *Cc: *"Gluster Devel" <gluster-devel at gluster.org>
>>>> *Sent: *Saturday, January 12, 2019 6:46:20 PM
>>>> *Subject: *Re: [Gluster-devel] Regression health for release-5.next
>>>> and        release-6
>>>>
>>>> Previous logs related to client not bricks, below are the brick logs
>>>>
>>>> [2019-01-12 12:25:25.893485]:++++++++++
>>>> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o
>>>> 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++++++++++
>>>> The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict:
>>>> key 'trusted.ec.size' would not be sent on wire in the future [Invalid
>>>> argument]" repeated 199 times between [2019-01-12 12:25:25.283989] and
>>>> [2019-01-12 12:25:25.899532]
>>>> [2019-01-12 12:25:25.903375] E [MSGID: 113001]
>>>> [posix-inode-fd-ops.c:4617:_posix_handle_xattr_keyvalue_pair]
>>>> 8-patchy-posix: fgetxattr failed on
>>>> gfid=d91f6331-d394-479d-ab51-6bcf674ac3e0 while doing xattrop:
>>>> Key:trusted.ec.dirty (Bad file descriptor) [Bad file descriptor]
>>>> [2019-01-12 12:25:25.903468] E [MSGID: 115073]
>>>> [server-rpc-fops_v2.c:1805:server4_fxattrop_cbk] 0-patchy-server: 1486:
>>>> FXATTROP 2 (d91f6331-d394-479d-ab51-6bcf674ac3e0), client:
>>>> CTX_ID:b785c2b0-3453-4a03-b129-19e6ceeb5346-GRAPH_ID:0-PID:24147-HOST:softserve-moagrawa-test.1-PC_NAME:patchy-client-1-RECON_NO:-1,
>>>> error-xlator: patchy-posix [Bad file descriptor]
>>>>
>>>>
>>>> Thanks,
>>>> Mohit Agrawal
>>>>
>>>> On Sat, Jan 12, 2019 at 6:29 PM Mohit Agrawal <moagrawa at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>> For specific to "add-brick-and-validate-replicated-volume-options.t" i
>>>>> have posted a patch https://review.gluster.org/22015.
>>>>> For test case "ec/bug-1236065.t" I think the issue needs to be check
>>>>> by ec team
>>>>>
>>>>> On the brick side, it is showing below logs
>>>>>
>>>>> >>>>>>>>>>>>>>>>>
>>>>>
>>>>> on wire in the future [Invalid argument]
>>>>> The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict:
>>>>> key 'trusted.ec.dirty' would not be sent on wire in the future [Invalid
>>>>> argument]" repeated 3 times between [2019-01-12 12:25:25.902828] and
>>>>> [2019-01-12 12:25:25.902992]
>>>>> [2019-01-12 12:25:25.903553] W [MSGID: 114031]
>>>>> [client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-patchy-client-1:
>>>>> remote operation failed [Bad file descriptor]
>>>>> [2019-01-12 12:25:25.903998] W [MSGID: 122040]
>>>>> [ec-common.c:1181:ec_prepare_update_cbk] 0-patchy-disperse-0: Failed to get
>>>>> size and version :  FOP : 'FXATTROP' failed on gfid
>>>>> d91f6331-d394-479d-ab51-6bcf674ac3e0 [Input/output error]
>>>>> [2019-01-12 12:25:25.904059] W [fuse-bridge.c:1907:fuse_unlink_cbk]
>>>>> 0-glusterfs-fuse: 3259: UNLINK() /test/0.o => -1 (Input/output error)
>>>>>
>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>
>>>>> Test case is getting timed out because "volume heal $V0 full" command
>>>>> is stuck, look's like shd is getting stuck at getxattr
>>>>>
>>>>> >>>>>>>>>>>>>>.
>>>>>
>>>>> Thread 8 (Thread 0x7f83777fe700 (LWP 25552)):
>>>>> #0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>,
>>>>> loc=loc at entry=0x7f83777fdbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28
>>>>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0,
>>>>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680
>>>>> #2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030880,
>>>>> child=<optimized out>, loc=0x7f83777fdbb0, full=<optimized out>) at
>>>>> ec-heald.c:161
>>>>> #3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80094b0,
>>>>> entry=<optimized out>, parent=0x7f83777fdde0, data=0x7f83a8030880) at
>>>>> ec-heald.c:294
>>>>> #4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80094b0,
>>>>> loc=loc at entry=0x7f83777fdde0, pid=pid at entry=-6, data=data at entry=0x7f83a8030880,
>>>>> fn=fn at entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
>>>>> #5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030880,
>>>>> inode=<optimized out>) at ec-heald.c:311
>>>>> #6  0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030880) at
>>>>> ec-heald.c:372
>>>>> #7  0x00007f83bb709e25 in start_thread () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
>>>>> Thread 7 (Thread 0x7f8376ffd700 (LWP 25553)):
>>>>> #0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>,
>>>>> loc=loc at entry=0x7f8376ffcbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28
>>>>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0,
>>>>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680
>>>>> #2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80308f0,
>>>>> child=<optimized out>, loc=0x7f8376ffcbb0, full=<optimized out>) at
>>>>> ec-heald.c:161
>>>>> #3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a800d110,
>>>>> entry=<optimized out>, parent=0x7f8376ffcde0, data=0x7f83a80308f0) at
>>>>> ec-heald.c:294
>>>>> #4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a800d110,
>>>>> loc=loc at entry=0x7f8376ffcde0, pid=pid at entry=-6, data=data at entry=0x7f83a80308f0,
>>>>> fn=fn at entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
>>>>> #5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80308f0,
>>>>> inode=<optimized out>) at ec-heald.c:311
>>>>> #6  0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80308f0) at
>>>>> ec-heald.c:372
>>>>> #7  0x00007f83bb709e25 in start_thread () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
>>>>> Thread 6 (Thread 0x7f83767fc700 (LWP 25554)):
>>>>> #0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>,
>>>>> loc=loc at entry=0x7f83767fbbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28
>>>>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0,
>>>>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680
>>>>> #2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030960,
>>>>> child=<optimized out>, loc=0x7f83767fbbb0, full=<optimized out>) at
>>>>> ec-heald.c:161
>>>>> #3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8010af0,
>>>>> entry=<optimized out>, parent=0x7f83767fbde0, data=0x7f83a8030960) at
>>>>> ec-heald.c:294
>>>>> #4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8010af0,
>>>>> loc=loc at entry=0x7f83767fbde0, pid=pid at entry=-6, data=data at entry=0x7f83a8030960,
>>>>> fn=fn at entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
>>>>> #5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030960,
>>>>> inode=<optimized out>) at ec-heald.c:311
>>>>> #6  0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030960) at
>>>>> ec-heald.c:372
>>>>> #7  0x00007f83bb709e25 in start_thread () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
>>>>> Thread 5 (Thread 0x7f8375ffb700 (LWP 25555)):
>>>>> #0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>,
>>>>> loc=loc at entry=0x7f8375ffabb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28
>>>>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0,
>>>>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680
>>>>> #2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80309d0,
>>>>> child=<optimized out>, loc=0x7f8375ffabb0, full=<optimized out>) at
>>>>> ec-heald.c:161
>>>>> #3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80144d0,
>>>>> entry=<optimized out>, parent=0x7f8375ffade0, data=0x7f83a80309d0) at
>>>>> ec-heald.c:294
>>>>> #4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80144d0,
>>>>> loc=loc at entry=0x7f8375ffade0, pid=pid at entry=-6, data=data at entry=0x7f83a80309d0,
>>>>> fn=fn at entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
>>>>> #5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80309d0,
>>>>> inode=<optimized out>) at ec-heald.c:311
>>>>> #6  0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80309d0) at
>>>>> ec-heald.c:372
>>>>> #7  0x00007f83bb709e25 in start_thread () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
>>>>> Thread 4 (Thread 0x7f83757fa700 (LWP 25556)):
>>>>> #0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>,
>>>>> loc=loc at entry=0x7f83757f9bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28
>>>>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0,
>>>>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680
>>>>> #2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030a40,
>>>>> child=<optimized out>, loc=0x7f83757f9bb0, full=<optimized out>) at
>>>>> ec-heald.c:161
>>>>> #3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8017eb0,
>>>>> entry=<optimized out>, parent=0x7f83757f9de0, data=0x7f83a8030a40) at
>>>>> ec-heald.c:294
>>>>> #4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8017eb0,
>>>>> loc=loc at entry=0x7f83757f9de0, pid=pid at entry=-6, data=data at entry=0x7f83a8030a40,
>>>>> fn=fn at entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
>>>>> #5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030a40,
>>>>> inode=<optimized out>) at ec-heald.c:311
>>>>> #6  0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030a40) at
>>>>> ec-heald.c:372
>>>>> #7  0x00007f83bb709e25 in start_thread () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
>>>>> Thread 3 (Thread 0x7f8374ff9700 (LWP 25557)):
>>>>> #0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>,
>>>>> loc=loc at entry=0x7f8374ff8bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28
>>>>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0,
>>>>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680
>>>>> #2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030ab0,
>>>>> child=<optimized out>, loc=0x7f8374ff8bb0, full=<optimized out>) at
>>>>> ec-heald.c:161
>>>>> #3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801b890,
>>>>> entry=<optimized out>, parent=0x7f8374ff8de0, data=0x7f83a8030ab0) at
>>>>> ec-heald.c:294
>>>>> #4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801b890,
>>>>> loc=loc at entry=0x7f8374ff8de0, pid=pid at entry=-6, data=data at entry=0x7f83a8030ab0,
>>>>> fn=fn at entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
>>>>> #5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030ab0,
>>>>> inode=<optimized out>) at ec-heald.c:311
>>>>> #6  0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030ab0) at
>>>>> ec-heald.c:372
>>>>> #7  0x00007f83bb709e25 in start_thread () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
>>>>> Thread 2 (Thread 0x7f8367fff700 (LWP 25558)):
>>>>> #0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>,
>>>>> loc=loc at entry=0x7f8367ffebb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28
>>>>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0,
>>>>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680
>>>>> #2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030b20,
>>>>> child=<optimized out>, loc=0x7f8367ffebb0, full=<optimized out>) at
>>>>> ec-heald.c:161
>>>>> #3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801f270,
>>>>> entry=<optimized out>, parent=0x7f8367ffede0, data=0x7f83a8030b20) at
>>>>> ec-heald.c:294
>>>>> #4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801f270,
>>>>> loc=loc at entry=0x7f8367ffede0, pid=pid at entry=-6, data=data at entry=0x7f83a8030b20,
>>>>> fn=fn at entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
>>>>> #5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030b20,
>>>>> inode=<optimized out>) at ec-heald.c:311
>>>>> #6  0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030b20) at
>>>>> ec-heald.c:372
>>>>> #7  0x00007f83bb709e25 in start_thread () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
>>>>> Thread 1 (Thread 0x7f83bcdd1780 (LWP 25383)):
>>>>> #0  0x00007f83bb70af57 in pthread_join () from
>>>>> /usr/lib64/libpthread.so.0
>>>>> #1  0x00007f83bc92eff8 in event_dispatch_epoll
>>>>> (event_pool=0x55af0a6dd560) at event-epoll.c:846
>>>>> #2  0x000055af0a4116b8 in main (argc=15, argv=0x7fff75610898) at
>>>>> glusterfsd.c:2848
>>>>>
>>>>>
>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>.
>>>>>
>>>>> Thanks,
>>>>> Mohit Agrawal
>>>>>
>>>>> On Fri 11 Jan, 2019, 21:20 Shyam Ranganathan <srangana at redhat.com
>>>>> wrote:
>>>>>
>>>>>> We can check health on master post the patch as stated by Mohit below.
>>>>>>
>>>>>> Release-5 is causing some concerns as we need to tag the release
>>>>>> yesterday, but we have the following 2 tests failing or coredumping
>>>>>> pretty regularly, need attention on these.
>>>>>>
>>>>>> ec/bug-1236065.t
>>>>>> glusterd/add-brick-and-validate-replicated-volume-options.t
>>>>>>
>>>>>> Shyam
>>>>>> On 1/10/19 6:20 AM, Mohit Agrawal wrote:
>>>>>> > I think we should consider regression-builds after merged the patch
>>>>>> > (https://review.gluster.org/#/c/glusterfs/+/21990/)
>>>>>> > as we know this patch introduced some delay.
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Mohit Agrawal
>>>>>> >
>>>>>> > On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee <amukherj at redhat.com
>>>>>> > <mailto:amukherj at redhat.com>> wrote:
>>>>>> >
>>>>>> >     Mohit, Sanju - request you to investigate the failures related
>>>>>> to
>>>>>> >     glusterd and brick-mux and report back to the list.
>>>>>> >
>>>>>> >     On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan
>>>>>> >     <srangana at redhat.com <mailto:srangana at redhat.com>> wrote:
>>>>>> >
>>>>>> >         Hi,
>>>>>> >
>>>>>> >         As part of branching preparation next week for release-6,
>>>>>> please
>>>>>> >         find
>>>>>> >         test failures and respective test links here [1].
>>>>>> >
>>>>>> >         The top tests that are failing/dumping-core are as below and
>>>>>> >         need attention,
>>>>>> >         - ec/bug-1236065.t
>>>>>> >         -
>>>>>> glusterd/add-brick-and-validate-replicated-volume-options.t
>>>>>> >         - readdir-ahead/bug-1390050.t
>>>>>> >         - glusterd/brick-mux-validation.t
>>>>>> >         - bug-1432542-mpx-restart-crash.t
>>>>>> >
>>>>>> >         Others of interest,
>>>>>> >         - replicate/bug-1341650.t
>>>>>> >
>>>>>> >         Please file a bug if needed against the test case and
>>>>>> report the
>>>>>> >         same
>>>>>> >         here, in case a problem is already addressed, then do send
>>>>>> back the
>>>>>> >         patch details that addresses this issue as a response to
>>>>>> this mail.
>>>>>> >
>>>>>> >         Thanks,
>>>>>> >         Shyam
>>>>>> >
>>>>>> >         [1] Regression failures:
>>>>>> >         https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view
>>>>>> >         _______________________________________________
>>>>>> >         Gluster-devel mailing list
>>>>>> >         Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org
>>>>>> >
>>>>>> >         https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>> --
>> --Atin
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190117/8f654dde/attachment-0001.html>