[Gluster-devel] Crash in glusterd!!!

Wed Dec 6 12:34:35 UTC 2017

[2017-12-01 14:10:13.060193] I [MSGID: 100030] [glusterfsd.c:2348:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd       version
3.7.6 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)

>From the above I can see you're running a way too old gluster version which
has been EOLed. Please upgrade your cluster to the latest supported
versions.

On Wed, Dec 6, 2017 at 5:39 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com>
wrote:

> I hope these logs were sufficient... please let me know if you require
> more logs.
>
> On Wed, Dec 6, 2017 at 3:26 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com>
> wrote:
>
>> Hi Atin,
>>
>> Please find the backtrace and logs files attached here.
>>
>> Also, below are the BT from core.
>>
>> (gdb) bt
>>
>> #0  0x00003fff8834b898 in __GI_raise (sig=<optimized out>) at
>> ../sysdeps/unix/sysv/linux/raise.c:55
>>
>> #1  0x00003fff88350fd0 in __GI_abort () at abort.c:89
>>
>>
>>
>> [**ALERT: The abort() might not be exactly invoked from the following
>> function line.
>>
>>                 If the trail function contains multiple abort() calls,
>> then you should cross check by other means to get correct abort() call
>> location.
>>
>>                 This is due to the optimized compilation which hides the
>> debug info for multiple abort() calls in a given function.
>>
>>                 Refer TR HU16995 for more information]
>>
>>
>>
>> #2  0x00003fff8838be04 in __libc_message (do_abort=<optimized out>,
>> fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175
>>
>> #3  0x00003fff8839aba8 in malloc_printerr (action=<optimized out>,
>> str=0x3fff8847e498 "double free or corruption (!prev)", ptr=<optimized
>> out>, ar_ptr=<optimized out>) at malloc.c:5007
>>
>> #4  0x00003fff8839ba40 in _int_free (av=0x3fff6c000020, p=<optimized
>> out>, have_lock=<optimized out>) at malloc.c:3868
>>
>> #5  0x00003fff885e0814 in __gf_free (free_ptr=0x3fff6c045da0) at
>> mem-pool.c:336
>>
>> #6  0x00003fff849093c4 in glusterd_friend_sm () at glusterd-sm.c:1295
>>
>> #7  0x00003fff84901a58 in __glusterd_handle_incoming_unfriend_req
>> (req=0x3fff8481c06c) at glusterd-handler.c:2606
>>
>> #8  0x00003fff848fb870 in glusterd_big_locked_handler
>> (req=0x3fff8481c06c, actor_fn=@0x3fff84a43e70: 0x3fff84901830
>> <__glusterd_handle_incoming_unfriend_req>) at glusterd-handler.c:83
>>
>> #9  0x00003fff848fbd08 in glusterd_handle_incoming_unfriend_req
>> (req=<optimized out>) at glusterd-handler.c:2615
>>
>> #10 0x00003fff8854e87c in rpcsvc_handle_rpc_call (svc=0x10062fd0
>> <_GLOBAL__sub_I__ZN27UehChSwitchFachToDchC_ActorC2EP12RTControllerP10RTActorRef()+1148>,
>> trans=<optimized out>, msg=0x3fff6c000920) at rpcsvc.c:705
>>
>> #11 0x00003fff8854eb7c in rpcsvc_notify (trans=0x3fff74002210,
>> mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at
>> rpcsvc.c:799
>>
>> #12 0x00003fff885514fc in rpc_transport_notify (this=<optimized out>,
>> event=<optimized out>, data=<optimized out>) at rpc-transport.c:546
>>
>> #13 0x00003fff847fcd44 in socket_event_poll_in (this=this at entry
>> =0x3fff74002210) at socket.c:2236
>>
>> #14 0x00003fff847ff89c in socket_event_handler (fd=<optimized out>,
>> idx=<optimized out>, data=0x3fff74002210, poll_in=<optimized out>,
>> poll_out=<optimized out>, poll_err=<optimized out>) at socket.c:2349
>>
>> #15 0x00003fff88616874 in event_dispatch_epoll_handler
>> (event=0x3fff83d9d6a0, event_pool=0x10045bc0 <_GLOBAL__sub_I__ZN29DrhIfRhCo
>> ntrolPdrProxyC_ActorC2EP12RTControllerP10RTActorRef()+116>) at
>> event-epoll.c:575
>>
>> #16 event_dispatch_epoll_worker (data=0x100bb4a0
>> <main_thread_func__()+1756>) at event-epoll.c:678
>>
>> #17 0x00003fff884cfb10 in start_thread (arg=0x3fff83d9e160) at
>> pthread_create.c:339
>>
>> #18 0x00003fff88419c0c in .__clone () at ../sysdeps/unix/sysv/linux/pow
>> erpc/powerpc64/clone.S:96
>>
>>
>>
>> (gdb) bt full
>>
>> #0  0x00003fff8834b898 in __GI_raise (sig=<optimized out>) at
>> ../sysdeps/unix/sysv/linux/raise.c:55
>>
>>         r4 = 1560
>>
>>         r7 = 16
>>
>>         arg2 = 1560
>>
>>         r5 = 6
>>
>>         r8 = 0
>>
>>         arg3 = 6
>>
>>         r0 = 250
>>
>>         r3 = 0
>>
>>         r6 = 8
>>
>>         arg1 = 0
>>
>>         sc_err = <optimized out>
>>
>>         sc_ret = <optimized out>
>>
>>         pd = 0x3fff83d9e160
>>
>>         pid = 0
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>>         selftid = 1560
>>
>> #1  0x00003fff88350fd0 in __GI_abort () at abort.c:89
>>
>>         save_stage = 2
>>
>>         act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction =
>> 0x0}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer
>> = 0x0}
>>
>>         sigs = {__val = {32, 0 <repeats 15 times>}}
>>
>>
>>
>> [**ALERT: The abort() might not be exactly invoked from the following
>> function line.
>>
>>                 If the trail function contains multiple abort() calls,
>> then you should cross check by other means to get correct abort() call
>> location.
>>
>>                 This is due to the optimized compilation which hides the
>> debug info for multiple abort() calls in a given function.
>>
>>                 Refer TR HU16995 for more information]
>>
>>
>>
>> #2  0x00003fff8838be04 in __libc_message (do_abort=<optimized out>,
>> fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:175
>>
>>         ap = <optimized out>
>>
>>         fd = <optimized out>
>>
>>         on_2 = <optimized out>
>>
>>         list = <optimized out>
>>
>>         nlist = <optimized out>
>>
>>         cp = <optimized out>
>>
>>         written = <optimized out>
>>
>> #3  0x00003fff8839aba8 in malloc_printerr (action=<optimized out>,
>> str=0x3fff8847e498 "*double free or corruption (!prev)*", ptr=<optimized
>> out>, ar_ptr=<optimized out>) at malloc.c:5007
>>
>>         buf = "00003fff6c045d60"
>>
>>         cp = <optimized out>
>>
>>         ar_ptr = <optimized out>
>>
>>         ptr = <optimized out>
>>
>>         str = 0x3fff8847e498 "double free or corruption (!prev)"
>>
>>         action = 3
>>
>> #4  0x00003fff8839ba40 in _int_free (av=0x3fff6c000020, p=<optimized
>> out>, have_lock=<optimized out>) at malloc.c:3868
>>
>>         size = <optimized out>
>>
>>         fb = <optimized out>
>>
>>         nextchunk = <optimized out>
>>
>>         nextsize = <optimized out>
>>
>>         nextinuse = <optimized out>
>>
>>         prevsize = <optimized out>
>>
>>         bck = <optimized out>
>>
>>         fwd = <optimized out>
>>
>>         errstr = <optimized out>
>>
>>         locked = <optimized out>
>>
>>         __func__ = "_int_free"
>>
>> #5  0x00003fff885e0814 in __gf_free (free_ptr=0x3fff6c045da0) at
>> mem-pool.c:336
>>
>>         ptr = 0x3fff6c045d60
>>
>>         mem_acct = <optimized out>
>>
>>         header = 0x3fff6c045d60
>>
>>         free_ptr = 0x3fff6c045da0
>>
>> #6  0x00003fff849093c4 in glusterd_friend_sm () at glusterd-sm.c:1295
>>
>>         event = 0x3fff6c045da0
>>
>>         tmp = 0x3fff6c045da0
>>
>>         ret = <optimized out>
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>>         handler = @0x3fff84a44038: 0x3fff84906750
>> <glusterd_ac_friend_remove>
>>
>>         state = 0x3fff84a390c0 <glusterd_state_befriended>
>>
>>         peerinfo = <optimized out>
>>
>>         event_type = GD_FRIEND_EVENT_REMOVE_FRIEND
>>
>>         is_await_conn = <optimized out>
>>
>>         quorum_action = <optimized out>
>>
>>         old_state = GD_FRIEND_STATE_BEFRIENDED
>>
>>         this = <optimized out>
>>
>>         priv = 0x3fff84748050
>>
>>         __FUNCTION__ = "glusterd_friend_sm"
>>
>> #7  0x00003fff84901a58 in __glusterd_handle_incoming_unfriend_req
>> (req=0x3fff8481c06c) at glusterd-handler.c:2606
>>
>>         ret = 0
>>
>>         friend_req = {uuid = "\231\214R¦\177\223I\216\236Õ\214dÎöy¡",
>> hostname = 0x3fff6c028ef0 "", port = 0, vols = {vols_len = 0, vols_val =
>> 0x0}}
>>
>>         remote_hostname = "10.32.0.48", '\000' <repeats 98 times>
>>
>>         __FUNCTION__ = "__glusterd_handle_incoming_unfriend_req"
>>
>> #8  0x00003fff848fb870 in glusterd_big_locked_handler
>> (req=0x3fff8481c06c, actor_fn=@0x3fff84a43e70: 0x3fff84901830
>> <__glusterd_handle_incoming_unfriend_req>) at glusterd-handler.c:83
>>
>>         priv = 0x3fff84748050
>>
>>         ret = -1
>>
>> #9  0x00003fff848fbd08 in glusterd_handle_incoming_unfriend_req
>> (req=<optimized out>) at glusterd-handler.c:2615
>>
>> No locals.
>>
>> #10 0x00003fff8854e87c in rpcsvc_handle_rpc_call (svc=0x10062fd0
>> <_GLOBAL__sub_I__ZN27UehChSwitchFachToDchC_ActorC2EP12RTControllerP10RTActorRef()+1148>,
>> trans=<optimized out>, msg=0x3fff6c000920) at rpcsvc.c:705
>>
>>         actor = 0x3fff84a38860 <gd_svc_peer_actors+192>
>>
>>         actor_fn = @0x3fff84a43ab0: 0x3fff848fbcf0
>> <glusterd_handle_incoming_unfriend_req>
>>
>>         req = 0x3fff8481c06c
>>
>>         ret = -1
>>
>>         port = <optimized out>
>>
>>         unprivileged = <optimized out>
>>
>>         reply = <optimized out>
>>
>>         drc = <optimized out>
>>
>>         __FUNCTION__ = "rpcsvc_handle_rpc_call"
>>
>> #11 0x00003fff8854eb7c in rpcsvc_notify (trans=0x3fff74002210,
>> mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at
>> rpcsvc.c:799
>>
>>         ret = -1
>>
>>         msg = <optimized out>
>>
>>         new_trans = 0x0
>>
>>         svc = <optimized out>
>>
>>         listener = 0x0
>>
>>         __FUNCTION__ = "rpcsvc_notify"
>>
>> #12 0x00003fff885514fc in rpc_transport_notify (this=<optimized out>,
>> event=<optimized out>, data=<optimized out>) at rpc-transport.c:546
>>
>>         ret = -1
>>
>>         __FUNCTION__ = "rpc_transport_notify"
>>
>> #13 0x00003fff847fcd44 in socket_event_poll_in (this=this at entry
>> =0x3fff74002210) at socket.c:2236
>>
>>         ret = <optimized out>
>>
>>         pollin = 0x3fff6c000920
>>
>>         priv = 0x3fff74002d50
>>
>> #14 0x00003fff847ff89c in socket_event_handler (fd=<optimized out>,
>> idx=<optimized out>, data=0x3fff74002210, poll_in=<optimized out>,
>> poll_out=<optimized out>, poll_err=<optimized out>) at socket.c:2349
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>>         this = 0x3fff74002210
>>
>>         priv = 0x3fff74002d50
>>
>>         ret = <optimized out>
>>
>>         __FUNCTION__ = "socket_event_handler"
>>
>> #15 0x00003fff88616874 in event_dispatch_epoll_handler
>> (event=0x3fff83d9d6a0, event_pool=0x10045bc0 <_GLOBAL__sub_I__ZN29DrhIfRhCo
>> ntrolPdrProxyC_ActorC2EP12RTControllerP10RTActorRef()+116>) at
>> event-epoll.c:575
>>
>>         handler = @0x3fff8481a620: 0x3fff847ff6f0 <socket_event_handler>
>>
>>         gen = 1
>>
>>         slot = 0x100803f0 <_GLOBAL__sub_I__ZN24RoamIfFro
>> RrcRoExtAttribDC2Ev()+232>
>>
>>         data = <optimized out>
>>
>>         ret = -1
>>
>>         fd = 8
>>
>>         ev_data = 0x3fff83d9d6a8
>>
>>         idx = 7
>>
>> #16 event_dispatch_epoll_worker (data=0x100bb4a0
>> <main_thread_func__()+1756>) at event-epoll.c:678
>>
>>         event = {events = 1, data = {ptr = 0x700000001, fd = 7, u32 = 7,
>> u64 = 30064771073}}
>>
>>         ret = <optimized out>
>>
>>         ev_data = 0x100bb4a0 <main_thread_func__()+1756>
>>
>>         event_pool = 0x10045bc0 <_GLOBAL__sub_I__ZN29DrhIfRhCo
>> ntrolPdrProxyC_ActorC2EP12RTControllerP10RTActorRef()+116>
>>
>>         myindex = <optimized out>
>>
>>         timetodie = 0
>>
>>         __FUNCTION__ = "event_dispatch_epoll_worker"
>>
>> #17 0x00003fff884cfb10 in start_thread (arg=0x3fff83d9e160) at
>> pthread_create.c:339
>>
>>         pd = 0x3fff83d9e160
>>
>>         now = <optimized out>
>>
>>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-6868946778599096053,
>> 70366736145408, -6868946778421678961, 0, 0, 70366652919808, 70366661304864,
>> 8388608, 70366736105504, 269202592, 70367897957648, 70366736131032,
>> 70366737568040, 3, 0, 70366736131048, 70367897957296, 70367897957352,
>> 4001536, 70366736106520, 70366661302080, -3187654076, 0 <repeats 42
>> times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data =
>> {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>>
>>         not_first_call = <optimized out>
>>
>>         pagesize_m1 = <optimized out>
>>
>>         sp = <optimized out>
>>
>>         freesize = <optimized out>
>>
>>         __PRETTY_FUNCTION__ = "start_thread"
>>
>> #18 0x00003fff88419c0c in .__clone () at ../sysdeps/unix/sysv/linux/pow
>> erpc/powerpc64/clone.S:96
>>
>> No locals.
>>
>> Regards,
>> Abhishek
>>
>> On Wed, Dec 6, 2017 at 3:21 PM, Atin Mukherjee <amukherj at redhat.com>
>> wrote:
>>
>>> Without the glusterd log file and the core file or the backtrace I can't
>>> comment anything.
>>>
>>> On Wed, Dec 6, 2017 at 3:09 PM, ABHISHEK PALIWAL <
>>> abhishpaliwal at gmail.com> wrote:
>>>
>>>> Any suggestion....
>>>>
>>>> On Dec 6, 2017 11:51, "ABHISHEK PALIWAL" <abhishpaliwal at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> We are getting the crash in glusterd after start of it. When I tried
>>>>> to debug in brick logs we are getting below errors:
>>>>>
>>>>> [2017-12-01 14:10:14.684122] E [MSGID: 100018]
>>>>> [glusterfsd.c:1960:glusterfs_pidfile_update] 0-glusterfsd: pidfile
>>>>> /system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid
>>>>> lock failed [Resource temporarily unavailable]
>>>>> :
>>>>> :
>>>>> :
>>>>> [2017-12-01 14:10:16.862903] E [MSGID: 113001]
>>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=18:
>>>>> key:trusted.bit-rot.version [No space left on device]
>>>>> [2017-12-01 14:10:16.862985] I [MSGID: 115063]
>>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>>> 92: FTRUNCATE 1 (934f08b7-e3b5-4690-84fc-742a4b1fb78b)==> (No space
>>>>> left on device) [No space left on device]
>>>>> [2017-12-01 14:10:16.907037] E [MSGID: 113001]
>>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=17:
>>>>> key:trusted.bit-rot.version [No space left on device]
>>>>> [2017-12-01 14:10:16.907108] I [MSGID: 115063]
>>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>>> 35: FTRUNCATE 0 (109d6537-a1ec-4556-8ce1-04c365c451eb)==> (No space
>>>>> left on device) [No space left on device]
>>>>> [2017-12-01 14:10:16.947541] E [MSGID: 113001]
>>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=17:
>>>>> key:trusted.bit-rot.version [No space left on device]
>>>>> [2017-12-01 14:10:16.947623] I [MSGID: 115063]
>>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>>> 70: FTRUNCATE 0 (8f9c8054-b0d7-4b93-a95b-cd3ab249c56d)==> (No space
>>>>> left on device) [No space left on device]
>>>>> [2017-12-01 14:10:16.968515] E [MSGID: 113001]
>>>>> [posix.c:4616:_posix_remove_xattr] 0-c_glusterfs-posix: removexattr
>>>>> failed on /opt/lvmdir/c2/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/configuration
>>>>> (for trusted.glusterfs.dht) [No space left on device]
>>>>> [2017-12-01 14:10:16.968589] I [MSGID: 115058]
>>>>> [server-rpc-fops.c:740:server_removexattr_cbk] 0-c_glusterfs-server:
>>>>> 90: REMOVEXATTR <gfid:a240d2fd-869c-408d-9b95-62ee1bff074e>
>>>>> (a240d2fd-869c-408d-9b95-62ee1bff074e) of key  ==> (No space left on
>>>>> device) [No space left on device]
>>>>> [2017-12-01 14:10:17.039815] E [MSGID: 113001]
>>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=17:
>>>>> key:trusted.bit-rot.version [No space left on device]
>>>>> [2017-12-01 14:10:17.039900] I [MSGID: 115063]
>>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>>> 152: FTRUNCATE 0 (d67bcfcd-ff19-4b58-9823-46d6cce9ace3)==> (No space
>>>>> left on device) [No space left on device]
>>>>> [2017-12-01 14:10:17.048767] E [MSGID: 113001]
>>>>> [posix-helpers.c:1228:posix_fhandle_pair] 0-c_glusterfs-posix: fd=17:
>>>>> key:trusted.bit-rot.version [No space left on device]
>>>>> [2017-12-01 14:10:17.048874] I [MSGID: 115063]
>>>>> [server-rpc-fops.c:1317:server_ftruncate_cbk] 0-c_glusterfs-server:
>>>>> 163: FTRUNCATE 0 (0e3ee6ad-408b-4fcf-a1a7-4262ec113316)==> (No space
>>>>> left on device) [No space left on device]
>>>>> [2017-12-01 14:10:17.075007] E [MSGID: 113001]
>>>>> [posix.c:4616:_posix_remove_xattr] 0-c_glusterfs-posix: removexattr
>>>>> failed on /opt/lvmdir/c2/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/java
>>>>> (for trusted.glusterfs.dht) [No space left on device]
>>>>>
>>>>> Also, we are having the lack disk space.
>>>>>
>>>>> Could any one please explain me what glusterd is doing in brick so
>>>>> that it is causing of its crash.
>>>>>
>>>>> Please find the brick logs in attachment.
>>>>>
>>>>> Thanks in advance!!!
>>>>> --
>>>>> Regards
>>>>> Abhishek Paliwal
>>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>>
>>
>>
>> --
>>
>>
>>
>>
>> Regards
>> Abhishek Paliwal
>>
>
>
>
> --
>
>
>
>
> Regards
> Abhishek Paliwal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20171206/3d196632/attachment-0001.html>