[Bugs] [Bug 1639632] New: glustershd coredump generated

bugzilla at redhat.com bugzilla at redhat.com
Tue Oct 16 09:14:20 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1639632

            Bug ID: 1639632
           Summary: glustershd coredump generated
           Product: GlusterFS
           Version: 3.12
         Component: selfheal
          Assignee: bugs at gluster.org
          Reporter: zz.sh.cynthia at gmail.com
                CC: bugs at gluster.org



Created attachment 1494315
  --> https://bugzilla.redhat.com/attachment.cgi?id=1494315&action=edit
coredump file of glustershd process

Description of problem:

sometimes glustershd coredump generated
Version-Release number of selected component (if applicable):


How reproducible:

make split-brain when glustershd working, sometimes glustershd coredump will
generate
Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s sn-0.local --volfile-id
gluster/glustershd -p /var/run/g'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300,
iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
2802    client-rpc-fops.c: No such file or directory.
[Current thread is 1 (Thread 0x7f1b5f00c700 (LWP 1818))]
Missing separate debuginfos, use: dnf debuginfo-install
rcp-pack-glusterfs-1.2.0_1_g54e6196-RCP2.wf29.x86_64
(gdb) bt
#0  0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300,
iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
#1  0x00007f1b64553d47 in rpc_clnt_handle_reply (clnt=0x7f1b5808bbb0,
pollin=0x7f1b580c6620) at rpc-clnt.c:778
#2  0x00007f1b645542e5 in rpc_clnt_notify (trans=0x7f1b5808bde0,
mydata=0x7f1b5808bbe0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f1b580c6620)
at rpc-clnt.c:971
#3  0x00007f1b64550319 in rpc_transport_notify (this=0x7f1b5808bde0,
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f1b580c6620) at rpc-transport.c:538
#4  0x00007f1b5f49734d in socket_event_poll_in (this=0x7f1b5808bde0,
notify_handled=_gf_true) at socket.c:2315
#5  0x00007f1b5f497992 in socket_event_handler (fd=25, idx=15, gen=7,
data=0x7f1b5808bde0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x00007f1b647fe5ac in event_dispatch_epoll_handler (event_pool=0x230cb00,
event=0x7f1b5f00be84) at event-epoll.c:583
#7  0x00007f1b647fe883 in event_dispatch_epoll_worker (data=0x23543d0) at
event-epoll.c:659
#8  0x00007f1b6354a5da in start_thread () from /lib64/libpthread.so.0
#9  0x00007f1b62e20cbf in clone () from /lib64/libc.so.6


(gdb) print *(call_frame_t*)myframe
$1 = {root = 0x100000000, parent = 0x100000005, frames = {next =
0x7f1b4401c8a8, prev = 0x7f1b44010190}, local = 0x0, this = 0x0, ret = 0x0,
ref_count = 0, lock = {spinlock = 0, mutex = {__data = {
        __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins
= 0, __elision = 0, __list = {__prev = 0x7f1b44010190, __next = 0x0}}, 
      __size = '\000' <repeats 24 times>,
"\220\001\001D\033\177\000\000\000\000\000\000\000\000\000", __align = 0}},
cookie = 0x7f1b4401ccf0, complete = _gf_false, op = GF_FOP_NULL, begin = {
    tv_sec = 139755081730912, tv_usec = 139755081785872}, end = {tv_sec =
448811404, tv_usec = 21474836481}, wind_from = 0x0, wind_to = 0x0, unwind_from
= 0x0, unwind_to = 0x0}


time when glustershd corecdump generated:Oct 12 13:33:35.233839

the glustershd log does not contain when this issue happened, maybe because
this process coredump suddenly, the log prints stops serveral seconds before
coredump


[2018-09-26 13:04:35.788472] E [MSGID: 108008]
[afr-self-heal-common.c:336:afr_gfid_split_brain_source] 0-log-replicate-0:
Gfid mismatch detected for
<gfid:00000000-0000-0000-0000-000000000001>/tmp3.log>,
c7c6e434-ea21-4e5d-bf38-aef0cef586d4 on log-client-1 and
4b46e66b-728f-4419-9852-46f233a1327e on log-client-0.
[2018-09-26 13:04:35.788490] E [MSGID: 108008]
[afr-self-heal-entry.c:260:afr_selfheal_detect_gfid_and_type_mismatch]
0-log-replicate-0: Skipping conservative merge on the file.
[2018-09-26 13:04:35.798852] E [MSGID: 108008]
[afr-self-heal-common.c:213:afr_gfid_split_brain_source] 0-log-replicate-0: All
the bricks should be up to resolve the gfid split barin
[2018-09-26 13:04:35.798884] E [MSGID: 108008]
[afr-self-heal-common.c:336:afr_gfid_split_brain_source] 0-log-replicate-0:
Gfid mismatch detected for
<gfid:00000000-0000-0000-0000-000000000001>/tmpdir2\test>,
f9ce3cd5-3d2c-48fc-bdbe-1e478e7a6169 on log-client-1 and
0756665f-3481-4558-bc92-00e1d21d94a5 on log-client-0.
[2018-09-26 13:04:35.798902] E [MSGID: 108008]
[afr-self-heal-entry.c:260:afr_selfheal_detect_gfid_and_type_mismatch]
0-log-replicate-0: Skipping conservative merge on the file.
[2018-09-26 13:04:35.812233] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
0-ccs-client-2: changing port to 49152 (from 0)
[2018-09-26 13:04:35.816120] I [MSGID: 114057]
[client-handshake.c:1478:select_server_supported_programs] 0-ccs-client-2:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-09-26 13:04:35.818343] I [MSGID: 114046]
[client-handshake.c:1231:client_setvolume_cbk] 0-ccs-client-2: Connected to
ccs-client-2, attached to remote volume '/mnt/bricks/ccs/brick'.
[2018-09-26 13:04:35.818374] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk] 0-ccs-client-2: Server and
Client lk-version numbers are not same, reopening the fds
[2018-09-26 13:04:35.818712] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ccs-client-2: Server lk
version = 1
[2018-09-26 13:04:35.823312] E [MSGID: 108008]
[afr-self-heal-common.c:213:afr_gfid_split_brain_source] 0-log-replicate-0: All
the bricks should be up to resolve the gfid split barin
[2018-09-26 13:04:35.823371] E [MSGID: 108008]
[afr-self-heal-common.c:336:afr_gfid_split_brain_source] 0-log-replicate-0:
Gfid mismatch detected for
<gfid:00000000-0000-0000-0000-000000000001>/tmp9_soft2.log>,
e0c47659-8b6a-4aee-a91f-489865c5d51d on log-client-1 and
f3f69269-3995-44c9-9922-96cfadf7fed1 on log-client-0.
[2018-09-26 13:04:35.823389] E [MSGID: 108008]
[afr-self-heal-entry.c:260:afr_selfheal_detect_gfid_and_type_mismatch]
0-log-replicate-0: Skipping conservative merge on the file.
[2018-09-26 13:04:35.825338] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
0-export-client-2: changing port to 49153 (from 0)
[2018-09-26 13:04:35.828874] I [MSGID: 114057]
[client-handshake.c:1478:select_server_supported_programs] 0-export-client-2:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-09-26 13:04:35.829371] I [MSGID: 114046]
[client-handshake.c:1231:client_setvolume_cbk] 0-export-client-2: Connected to
export-client-2, attached to remote volume '/mnt/bricks/export/brick'.
[2018-09-26 13:04:35.829390] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk] 0-export-client-2: Server and
Client lk-version numbers are not same, reopening the fds
[2018-09-26 13:04:35.829587] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-export-client-2: Server lk
version = 1
[2018-09-26 13:04:35.855548] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
0-log-client-2: changing port to 49154 (from 0)
[2018-09-26 13:04:35.860969] I [MSGID: 114057]
[client-handshake.c:1478:select_server_supported_programs] 0-log-client-2:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-09-26 13:04:35.863599] I [MSGID: 114046]
[client-handshake.c:1231:client_setvolume_cbk] 0-log-client-2: Connected to
log-client-2, attached to remote volume '/mnt/bricks/log/brick'.
[2018-09-26 13:04:35.863620] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk] 0-log-client-2: Server and
Client lk-version numbers are not same, reopening the fds
[2018-09-26 13:04:35.864266] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-log-client-2: Server lk
version = 1
[2018-09-26 13:04:35.871037] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
0-mstate-client-2: changing port to 49155 (from 0)
[2018-09-26 13:04:35.879356] E [MSGID: 108008]
[afr-self-heal-common.c:213:afr_gfid_split_brain_source] 0-mstate-replicate-0:
All the bricks should be up to resolve the gfid split barin
[2018-09-26 13:04:35.879395] E [MSGID: 108008]
[afr-self-heal-common.c:336:afr_gfid_split_brain_source] 0-mstate-replicate-0:
Gfid mismatch detected for
<gfid:00000000-0000-0000-0000-000000000001>/tmpdir4>,
b0aa432d-38a0-426d-98b9-aa4304176d87 on mstate-client-1 and
54a3fb44-34e4-4d9e-b36d-7aaf4fd5f9bf on mstate-client-0.
[2018-09-26 13:04:35.879410] E [MSGID: 108008]
[afr-self-heal-entry.c:260:afr_selfheal_detect_gfid_and_type_mismatch]
0-mstate-replicate-0: Skipping conservative merge on the file.
[2018-09-26 13:04:35.881894] I [MSGID: 114057]
[client-handshake.c:1478:select_server_supported_programs] 0-mstate-client-2:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-09-26 13:04:35.882558] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
0-services-client-2: changing port to 49156 (from 0)
[2018-09-26 13:04:35.888949] I [MSGID: 114057]
[client-handshake.c:1478:select_server_supported_programs] 0-services-client-2:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-09-26 13:04:35.891470] I [MSGID: 114046]
[client-handshake.c:1231:client_setvolume_cbk] 0-services-client-2: Connected
to services-client-2, attached to remote volume '/mnt/bricks/services/brick'.
[2018-09-26 13:04:35.891577] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk] 0-services-client-2: Server and
Client lk-version numbers are not same, reopening the fds
[2018-09-26 13:04:35.892489] E [MSGID: 108008]
[afr-self-heal-common.c:213:afr_gfid_split_brain_source] 0-mstate-replicate-0:
All the bricks should be up to resolve the gfid split barin
[2018-09-26 13:04:35.892520] E [MSGID: 108008]
[afr-self-heal-common.c:336:afr_gfid_split_brain_source] 0-mstate-replicate-0:
Gfid mismatch detected for
<gfid:00000000-0000-0000-0000-000000000001>/tmp3.log>,
c45aca32-d5e0-42ca-9a49-413d34df5be3 on mstate-client-1 and
763bf6d3-fcc5-4ede-b214-135c82dbe388 on mstate-client-0.
[2018-09-26 13:04:35.892536] E [MSGID: 108008]
[afr-self-heal-entry.c:260:afr_selfheal_detect_gfid_and_type_mismatch]
0-mstate-replicate-0: Skipping conservative merge on the file.
[2018-09-26 13:04:35.892661] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-services-client-2: Server
lk version = 1
[2018-09-26 13:04:35.902781] E [MSGID: 108008]
[afr-self-heal-common.c:213:afr_gfid_split_brain_source] 0-mstate-replicate-0:
All the bricks should be up to resolve the gfid split barin
[2018-09-26 13:04:35.903188] E [MSGID: 108008]
[afr-self-heal-common.c:336:afr_gfid_split_brain_source] 0-mstate-replicate-0:
Gfid mismatch detected for
<gfid:00000000-0000-0000-0000-000000000001>/tmpdir2>,
8047eda2-e006-4720-b230-2dd197fa83da on mstate-client-1 and
ba7636e9-01d9-44ba-85ac-708c7b588c27 on mstate-client-0.
[2018-09-26 13:04:35.903213] E [MSGID: 108008]
[afr-self-heal-entry.c:260:afr_selfheal_detect_gfid_and_type_mismatch]
0-mstate-replicate-0: Skipping conservative merge on the file.
[2018-09-26 13:04:35.915219] E [MSGID: 108008]
[afr-self-heal-common.c:213:afr_gfid_split_brain_source] 0-mstate-replicate-0:
All the bricks should be up to resolve the gfid split barin
[2018-09-26 13:04:35.915253] E [MSGID: 108008]
[afr-self-heal-common.c:336:afr_gfid_split_brain_source] 0-mstate-replicate-0:
Gfid mismatch detected for
<gfid:00000000-0000-0000-0000-000000000001>/tmp9_soft2.log>,
7e5dc038-0ae6-4ee1-b052-9f492d061071 on mstate-client-1 and
98cc1652-93f6-4c1f-9a04-c8b4daba01c9 on mstate-client-0.
[2018-09-26 13:04:35.915269] E [MSGID: 108008]
[afr-self-heal-entry.c:260:afr_selfheal_detect_gfid_and_type_mismatch]
0-mstate-replicate-0: Skipping conservative merge on the file.
[2018-09-26 13:04:35.917248] I [MSGID: 114046]
[client-handshake.c:1231:client_setvolume_cbk] 0-mstate-client-2: Connected to
mstate-client-2, attached to remote volume '/mnt/bricks/mstate/brick'.
[2018-09-26 13:04:35.922713] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk] 0-mstate-client-2: Server and
Client lk-version numbers are not same, reopening the fds
[2018-09-26 13:04:35.923249] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-mstate-client-2: Server lk
version = 1

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list