[Gluster-devel] Crash in glusterd!!!

Wed Dec 6 12:09:12 UTC 2017

I hope these logs were sufficient... please let me know if you require more
logs.

On Wed, Dec 6, 2017 at 3:26 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com>
wrote:

> Hi Atin,
>
> Please find the backtrace and logs files attached here.
>
> Also, below are the BT from core.
>
> (gdb) bt
>
> #0  0x00003fff8834b898 in __GI_raise (sig=<optimized out>) at
> ../sysdeps/unix/sysv/linux/raise.c:55
>
> #1  0x00003fff88350fd0 in __GI_abort () at abort.c:89
>
>
>
> [**ALERT: The abort() might not be exactly invoked from the following
> function line.
>
>                 If the trail function contains multiple abort() calls,
> then you should cross check by other means to get correct abort() call
> location.
>
>                 This is due to the optimized compilation which hides the
> debug info for multiple abort() calls in a given function.
>
>                 Refer TR HU16995 for more information]
>
>
>
> #2  0x00003fff8838be04 in __libc_message (do_abort=<optimized out>,
> fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175
>
> #3  0x00003fff8839aba8 in malloc_printerr (action=<optimized out>,
> str=0x3fff8847e498 "double free or corruption (!prev)", ptr=<optimized
> out>, ar_ptr=<optimized out>) at malloc.c:5007
>
> #4  0x00003fff8839ba40 in _int_free (av=0x3fff6c000020, p=<optimized out>,
> have_lock=<optimized out>) at malloc.c:3868
>
> #5  0x00003fff885e0814 in __gf_free (free_ptr=0x3fff6c045da0) at
> mem-pool.c:336
>
> #6  0x00003fff849093c4 in glusterd_friend_sm () at glusterd-sm.c:1295
>
> #7  0x00003fff84901a58 in __glusterd_handle_incoming_unfriend_req
> (req=0x3fff8481c06c) at glusterd-handler.c:2606
>
> #8  0x00003fff848fb870 in glusterd_big_locked_handler (req=0x3fff8481c06c,
> actor_fn=@0x3fff84a43e70: 0x3fff84901830 <__glusterd_handle_incoming_unfriend_req>)
> at glusterd-handler.c:83
>
> #9  0x00003fff848fbd08 in glusterd_handle_incoming_unfriend_req
> (req=<optimized out>) at glusterd-handler.c:2615
>
> #10 0x00003fff8854e87c in rpcsvc_handle_rpc_call (svc=0x10062fd0
> <_GLOBAL__sub_I__ZN27UehChSwitchFachToDchC_ActorC2EP12RTControllerP10RTActorRef()+1148>,
> trans=<optimized out>, msg=0x3fff6c000920) at rpcsvc.c:705
>
> #11 0x00003fff8854eb7c in rpcsvc_notify (trans=0x3fff74002210,
> mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at
> rpcsvc.c:799
>
> #12 0x00003fff885514fc in rpc_transport_notify (this=<optimized out>,
> event=<optimized out>, data=<optimized out>) at rpc-transport.c:546
>
> #13 0x00003fff847fcd44 in socket_event_poll_in (this=this at entry=0x3fff74002210)
> at socket.c:2236
>
> #14 0x00003fff847ff89c in socket_event_handler (fd=<optimized out>,
> idx=<optimized out>, data=0x3fff74002210, poll_in=<optimized out>,
> poll_out=<optimized out>, poll_err=<optimized out>) at socket.c:2349
>
> #15 0x00003fff88616874 in event_dispatch_epoll_handler
> (event=0x3fff83d9d6a0, event_pool=0x10045bc0 <_GLOBAL__sub_I__
> ZN29DrhIfRhControlPdrProxyC_ActorC2EP12RTControllerP10RTActorRef()+116>)
> at event-epoll.c:575
>
> #16 event_dispatch_epoll_worker (data=0x100bb4a0
> <main_thread_func__()+1756>) at event-epoll.c:678
>
> #17 0x00003fff884cfb10 in start_thread (arg=0x3fff83d9e160) at
> pthread_create.c:339
>
> #18 0x00003fff88419c0c in .__clone () at ../sysdeps/unix/sysv/linux/
> powerpc/powerpc64/clone.S:96
>
>
>
> (gdb) bt full
>
> #0  0x00003fff8834b898 in __GI_raise (sig=<optimized out>) at
> ../sysdeps/unix/sysv/linux/raise.c:55
>
>         r4 = 1560
>
>         r7 = 16
>
>         arg2 = 1560
>
>         r5 = 6
>
>         r8 = 0
>
>         arg3 = 6
>
>         r0 = 250
>
>         r3 = 0
>
>         r6 = 8
>
>         arg1 = 0
>
>         sc_err = <optimized out>
>
>         sc_ret = <optimized out>
>
>         pd = 0x3fff83d9e160
>
>         pid = 0
>
> ---Type <return> to continue, or q <return> to quit---
>
>         selftid = 1560
>
> #1  0x00003fff88350fd0 in __GI_abort () at abort.c:89
>
>         save_stage = 2
>
>         act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction =
> 0x0}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer
> = 0x0}
>
>         sigs = {__val = {32, 0 <repeats 15 times>}}
>
>
>
> [**ALERT: The abort() might not be exactly invoked from the following
> function line.
>
>                 If the trail function contains multiple abort() calls,
> then you should cross check by other means to get correct abort() call
> location.
>
>                 This is due to the optimized compilation which hides the
> debug info for multiple abort() calls in a given function.
>
>                 Refer TR HU16995 for more information]
>
>
>
> #2  0x00003fff8838be04 in __libc_message (do_abort=<optimized out>,
> fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175
>
>         ap = <optimized out>
>
>         fd = <optimized out>
>
>         on_2 = <optimized out>
>
>         list = <optimized out>
>
>         nlist = <optimized out>
>
>         cp = <optimized out>
>
>         written = <optimized out>
>
> #3  0x00003fff8839aba8 in malloc_printerr (action=<optimized out>,
> str=0x3fff8847e498 "*double free or corruption (!prev)*", ptr=<optimized
> out>, ar_ptr=<optimized out>) at malloc.c:5007
>
>         buf = "00003fff6c045d60"
>
>         cp = <optimized out>
>
>         ar_ptr = <optimized out>
>
>         ptr = <optimized out>
>
>         str = 0x3fff8847e498 "double free or corruption (!prev)"
>
>         action = 3
>
> #4  0x00003fff8839ba40 in _int_free (av=0x3fff6c000020, p=<optimized out>,
> have_lock=<optimized out>) at malloc.c:3868
>
>         size = <optimized out>
>
>         fb = <optimized out>
>
>         nextchunk = <optimized out>
>
>         nextsize = <optimized out>
>
>         nextinuse = <optimized out>
>
>         prevsize = <optimized out>
>
>         bck = <optimized out>
>
>         fwd = <optimized out>
>
>         errstr = <optimized out>
>
>         locked = <optimized out>
>
>         __func__ = "_int_free"
>
> #5  0x00003fff885e0814 in __gf_free (free_ptr=0x3fff6c045da0) at
> mem-pool.c:336
>
>         ptr = 0x3fff6c045d60
>
>         mem_acct = <optimized out>
>
>         header = 0x3fff6c045d60
>
>         free_ptr = 0x3fff6c045da0
>
> #6  0x00003fff849093c4 in glusterd_friend_sm () at glusterd-sm.c:1295
>
>         event = 0x3fff6c045da0
>
>         tmp = 0x3fff6c045da0
>
>         ret = <optimized out>
>
> ---Type <return> to continue, or q <return> to quit---
>
>         handler = @0x3fff84a44038: 0x3fff84906750
> <glusterd_ac_friend_remove>
>
>         state = 0x3fff84a390c0 <glusterd_state_befriended>
>
>         peerinfo = <optimized out>
>
>         event_type = GD_FRIEND_EVENT_REMOVE_FRIEND
>
>         is_await_conn = <optimized out>
>
>         quorum_action = <optimized out>
>
>         old_state = GD_FRIEND_STATE_BEFRIENDED
>
>         this = <optimized out>
>
>         priv = 0x3fff84748050
>
>         __FUNCTION__ = "glusterd_friend_sm"
>
> #7  0x00003fff84901a58 in __glusterd_handle_incoming_unfriend_req
> (req=0x3fff8481c06c) at glusterd-handler.c:2606
>
>         ret = 0
>
>         friend_req = {uuid = "\231\214R¦\177\223I\216\236Õ\214dÎöy¡",
> hostname = 0x3fff6c028ef0 "", port = 0, vols = {vols_len = 0, vols_val =
> 0x0}}
>
>         remote_hostname = "10.32.0.48", '\000' <repeats 98 times>
>
>         __FUNCTION__ = "__glusterd_handle_incoming_unfriend_req"
>
> #8  0x00003fff848fb870 in glusterd_big_locked_handler (req=0x3fff8481c06c,
> actor_fn=@0x3fff84a43e70: 0x3fff84901830 <__glusterd_handle_incoming_unfriend_req>)
> at glusterd-handler.c:83
>
>         priv = 0x3fff84748050
>
>         ret = -1
>
> #9  0x00003fff848fbd08 in glusterd_handle_incoming_unfriend_req
> (req=<optimized out>) at glusterd-handler.c:2615
>
> No locals.
>
> #10 0x00003fff8854e87c in rpcsvc_handle_rpc_call (svc=0x10062fd0
> <_GLOBAL__sub_I__ZN27UehChSwitchFachToDchC_ActorC2EP12RTControllerP10RTActorRef()+1148>,
> trans=<optimized out>, msg=0x3fff6c000920) at rpcsvc.c:705
>
>         actor = 0x3fff84a38860 <gd_svc_peer_actors+192>
>
>         actor_fn = @0x3fff84a43ab0: 0x3fff848fbcf0
> <glusterd_handle_incoming_unfriend_req>
>
>         req = 0x3fff8481c06c
>
>         ret = -1
>
>         port = <optimized out>
>
>         unprivileged = <optimized out>
>
>         reply = <optimized out>
>
>         drc = <optimized out>
>
>         __FUNCTION__ = "rpcsvc_handle_rpc_call"
>
> #11 0x00003fff8854eb7c in rpcsvc_notify (trans=0x3fff74002210,
> mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at
> rpcsvc.c:799
>
>         ret = -1
>
>         msg = <optimized out>
>
>         new_trans = 0x0
>
>         svc = <optimized out>
>
>         listener = 0x0
>
>         __FUNCTION__ = "rpcsvc_notify"
>
> #12 0x00003fff885514fc in rpc_transport_notify (this=<optimized out>,
> event=<optimized out>, data=<optimized out>) at rpc-transport.c:546
>
>         ret = -1
>
>         __FUNCTION__ = "rpc_transport_notify"
>
> #13 0x00003fff847fcd44 in socket_event_poll_in (this=this at entry=0x3fff74002210)
> at socket.c:2236
>
>         ret = <optimized out>
>
>         pollin = 0x3fff6c000920
>
>         priv = 0x3fff74002d50
>
> #14 0x00003fff847ff89c in socket_event_handler (fd=<optimized out>,
> idx=<optimized out>, data=0x3fff74002210, poll_in=<optimized out>,
> poll_out=<optimized out>, poll_err=<optimized out>) at socket.c:2349
>
> ---Type <return> to continue, or q <return> to quit---
>
>         this = 0x3fff74002210
>
>         priv = 0x3fff74002d50
>
>         ret = <optimized out>
>
>         __FUNCTION__ = "socket_event_handler"
>
> #15 0x00003fff88616874 in event_dispatch_epoll_handler
> (event=0x3fff83d9d6a0, event_pool=0x10045bc0 <_GLOBAL__sub_I__
> ZN29DrhIfRhControlPdrProxyC_ActorC2EP12RTControllerP10RTActorRef()+116>)
> at event-epoll.c:575
>
>         handler = @0x3fff8481a620: 0x3fff847ff6f0 <socket_event_handler>
>
>         gen = 1
>
>         slot = 0x100803f0 <_GLOBAL__sub_I__ZN24RoamIfFroRrcRoExtAttribDC2
> Ev()+232>
>
>         data = <optimized out>
>
>         ret = -1
>
>         fd = 8
>
>         ev_data = 0x3fff83d9d6a8
>
>         idx = 7
>
> #16 event_dispatch_epoll_worker (data=0x100bb4a0
> <main_thread_func__()+1756>) at event-epoll.c:678
>
>         event = {events = 1, data = {ptr = 0x700000001, fd = 7, u32 = 7,
> u64 = 30064771073}}
>
>         ret = <optimized out>
>
>         ev_data = 0x100bb4a0 <main_thread_func__()+1756>
>
>         event_pool = 0x10045bc0 <_GLOBAL__sub_I__
> ZN29DrhIfRhControlPdrProxyC_ActorC2EP12RTControllerP10RTActorRef()+116>
>
>         myindex = <optimized out>
>
>         timetodie = 0
>
>         __FUNCTION__ = "event_dispatch_epoll_worker"
>
> #17 0x00003fff884cfb10 in start_thread (arg=0x3fff83d9e160) at
> pthread_create.c:339
>
>         pd = 0x3fff83d9e160
>
>         now = <optimized out>
>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-6868946778599096053,
> 70366736145408, -6868946778421678961, 0, 0, 70366652919808, 70366661304864,
> 8388608, 70366736105504, 269202592, 70367897957648, 70366736131032,
> 70366737568040, 3, 0, 70366736131048, 70367897957296, 70367897957352,
> 4001536, 70366736106520, 70366661302080, -3187654076, 0 <repeats 42
> times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data =
> {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>
>         not_first_call = <optimized out>
>
>         pagesize_m1 = <optimized out>
>
>         sp = <optimized out>
>
>         freesize = <optimized out>
>
>         __PRETTY_FUNCTION__ = "start_thread"
>
> #18 0x00003fff88419c0c in .__clone () at ../sysdeps/unix/sysv/linux/
> powerpc/powerpc64/clone.S:96
>
> No locals.
>
> Regards,
> Abhishek
>
> On Wed, Dec 6, 2017 at 3:21 PM, Atin Mukherjee <amukherj at redhat.com>
> wrote:
>
>> Without the glusterd log file and the core file or the backtrace I can't
>> comment anything.
>>
>> On Wed, Dec 6, 2017 at 3:09 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com
>> > wrote:
>>
>>> Any suggestion....
>>>
>>> On Dec 6, 2017 11:51, "ABHISHEK PALIWAL" <abhishpaliwal at gmail.com>
>>> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> We are getting the crash in glusterd after start of it. When I tried to
>>>> debug in brick logs we are getting below errors:
>>>>
>>>> [2017-12-01 14:10:14.684122] E [MSGID: 100018]
>>>> [glusterfsd.c:1960:glusterfs_pidfile_update] 0-glusterfsd: pidfile
>>>> /system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid
>>>> lock failed [Resource temporarily unavailable]
>>>> :
>>>> :
>>>> :
>>>> [2017-12-01 14:10:16.862903] E [MSGID: 113001]
>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=18:
>>>> key:trusted.bit-rot.version [No space left on device]
>>>> [2017-12-01 14:10:16.862985] I [MSGID: 115063]
>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>> 92: FTRUNCATE 1 (934f08b7-e3b5-4690-84fc-742a4b1fb78b)==> (No space
>>>> left on device) [No space left on device]
>>>> [2017-12-01 14:10:16.907037] E [MSGID: 113001]
>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=17:
>>>> key:trusted.bit-rot.version [No space left on device]
>>>> [2017-12-01 14:10:16.907108] I [MSGID: 115063]
>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>> 35: FTRUNCATE 0 (109d6537-a1ec-4556-8ce1-04c365c451eb)==> (No space
>>>> left on device) [No space left on device]
>>>> [2017-12-01 14:10:16.947541] E [MSGID: 113001]
>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=17:
>>>> key:trusted.bit-rot.version [No space left on device]
>>>> [2017-12-01 14:10:16.947623] I [MSGID: 115063]
>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>> 70: FTRUNCATE 0 (8f9c8054-b0d7-4b93-a95b-cd3ab249c56d)==> (No space
>>>> left on device) [No space left on device]
>>>> [2017-12-01 14:10:16.968515] E [MSGID: 113001]
>>>> [posix.c:4616:_posix_remove_xattr] 0-c_glusterfs-posix: removexattr
>>>> failed on /opt/lvmdir/c2/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/configuration
>>>> (for trusted.glusterfs.dht) [No space left on device]
>>>> [2017-12-01 14:10:16.968589] I [MSGID: 115058]
>>>> [server-rpc-fops.c:740:server_removexattr_cbk] 0-c_glusterfs-server:
>>>> 90: REMOVEXATTR <gfid:a240d2fd-869c-408d-9b95-62ee1bff074e>
>>>> (a240d2fd-869c-408d-9b95-62ee1bff074e) of key  ==> (No space left on
>>>> device) [No space left on device]
>>>> [2017-12-01 14:10:17.039815] E [MSGID: 113001]
>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=17:
>>>> key:trusted.bit-rot.version [No space left on device]
>>>> [2017-12-01 14:10:17.039900] I [MSGID: 115063]
>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>> 152: FTRUNCATE 0 (d67bcfcd-ff19-4b58-9823-46d6cce9ace3)==> (No space
>>>> left on device) [No space left on device]
>>>> [2017-12-01 14:10:17.048767] E [MSGID: 113001]
>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=17:
>>>> key:trusted.bit-rot.version [No space left on device]
>>>> [2017-12-01 14:10:17.048874] I [MSGID: 115063]
>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>> 163: FTRUNCATE 0 (0e3ee6ad-408b-4fcf-a1a7-4262ec113316)==> (No space
>>>> left on device) [No space left on device]
>>>> [2017-12-01 14:10:17.075007] E [MSGID: 113001]
>>>> [posix.c:4616:_posix_remove_xattr] 0-c_glusterfs-posix: removexattr
>>>> failed on /opt/lvmdir/c2/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/java
>>>> (for trusted.glusterfs.dht) [No space left on device]
>>>>
>>>> Also, we are having the lack disk space.
>>>>
>>>> Could any one please explain me what glusterd is doing in brick so that
>>>> it is causing of its crash.
>>>>
>>>> Please find the brick logs in attachment.
>>>>
>>>> Thanks in advance!!!
>>>> --
>>>> Regards
>>>> Abhishek Paliwal
>>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>
>
> --
>
>
>
>
> Regards
> Abhishek Paliwal
>

-- 

Regards
Abhishek Paliwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20171206/1b58442b/attachment-0001.html>