[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Fri Feb 8 03:05:12 UTC 2019

Thanks Artem. Can you send us the coredump or the bt with symbols from the
crash?

Regards,
Nithya

On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810 at gmail.com> wrote:

> Sorry to disappoint, but the crash just happened again, so lru-limit=0
> didn't help.
>
> Here's the snippet of the crash and the subsequent remount by monit.
>
>
> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7f4402b99329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
> valid argument]
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 0-<SNIP>_data1-replicate-0: selecting local read_child
> <SNIP>_data1-client-3" repeated 39 times between [2019-02-08
> 01:11:18.043286] and [2019-02-08 01:13:07.915604]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
> [2019-02-08 01:13:09.311554]
> pending frames:
> frame : type(1) op(LOOKUP)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 6
> time of crash:
> 2019-02-08 01:13:09
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>
> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
> ---------
> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
> --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1)
> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 2
> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 3
> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 4
> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
> 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
> 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
> 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
> 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0)
> Final graph:
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
>
> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon810 at gmail.com>
> wrote:
>
>> I've added the lru-limit=0 parameter to the mounts, and I see it's taken
>> effect correctly:
>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>> --volfile-server=localhost --volfile-id=/<SNIP>  /mnt/<SNIP>"
>>
>> Let's see if it stops crashing or not.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>>
>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <archon810 at gmail.com>
>> wrote:
>>
>>> Hi Nithya,
>>>
>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing
>>> crashes, and no further releases have been made yet.
>>>
>>> volume info:
>>> Type: Replicate
>>> Volume ID: ****SNIP****
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 4 = 4
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ****SNIP****
>>> Brick2: ****SNIP****
>>> Brick3: ****SNIP****
>>> Brick4: ****SNIP****
>>> Options Reconfigured:
>>> cluster.quorum-count: 1
>>> cluster.quorum-type: fixed
>>> network.ping-timeout: 5
>>> network.remote-dio: enable
>>> performance.rda-cache-limit: 256MB
>>> performance.readdir-ahead: on
>>> performance.parallel-readdir: on
>>> network.inode-lru-limit: 500000
>>> performance.md-cache-timeout: 600
>>> performance.cache-invalidation: on
>>> performance.stat-prefetch: on
>>> features.cache-invalidation-timeout: 600
>>> features.cache-invalidation: on
>>> cluster.readdir-optimize: on
>>> performance.io-thread-count: 32
>>> server.event-threads: 4
>>> client.event-threads: 4
>>> performance.read-ahead: off
>>> cluster.lookup-optimize: on
>>> performance.cache-size: 1GB
>>> cluster.self-heal-daemon: enable
>>> transport.address-family: inet
>>> nfs.disable: on
>>> performance.client-io-threads: on
>>> cluster.granular-entry-heal: enable
>>> cluster.data-self-heal-algorithm: full
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> <http://twitter.com/ArtemR>
>>>
>>>
>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <nbalacha at redhat.com>
>>> wrote:
>>>
>>>> Hi Artem,
>>>>
>>>> Do you still see the crashes with 5.3? If yes, please try mount the
>>>> volume using the mount option lru-limit=0 and see if that helps. We are
>>>> looking into the crashes and will update when have a fix.
>>>>
>>>> Also, please provide the gluster volume info for the volume in question.
>>>>
>>>>
>>>> regards,
>>>> Nithya
>>>>
>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon810 at gmail.com>
>>>> wrote:
>>>>
>>>>> The fuse crash happened two more times, but this time monit helped
>>>>> recover within 1 minute, so it's a great workaround for now.
>>>>>
>>>>> What's odd is that the crashes are only happening on one of 4 servers,
>>>>> and I don't know why.
>>>>>
>>>>> Sincerely,
>>>>> Artem
>>>>>
>>>>> --
>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> beerpla.net | +ArtemRussakovskii
>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>> <http://twitter.com/ArtemR>
>>>>>
>>>>>
>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <
>>>>> archon810 at gmail.com> wrote:
>>>>>
>>>>>> The fuse crash happened again yesterday, to another volume. Are there
>>>>>> any mount options that could help mitigate this?
>>>>>>
>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) task
>>>>>> to watch and restart the mount, which works and recovers the mount point
>>>>>> within a minute. Not ideal, but a temporary workaround.
>>>>>>
>>>>>> By the way, the way to reproduce this "Transport endpoint is not
>>>>>> connected" condition for testing purposes is to kill -9 the right
>>>>>> "glusterfs --process-name fuse" process.
>>>>>>
>>>>>>
>>>>>> monit check:
>>>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
>>>>>>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>>>>>>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>>>>>>   if space usage > 90% for 5 times within 15 cycles
>>>>>>     then alert else if succeeded for 10 cycles then alert
>>>>>>
>>>>>>
>>>>>> stack trace:
>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>> [0x7fa0249e4329]
>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>> [0x7fa0249e4329]
>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>>>> The message "E [MSGID: 101191]
>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
>>>>>> [2019-02-01 23:21:56.164427]
>>>>>> The message "I [MSGID: 108031]
>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between
>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
>>>>>> pending frames:
>>>>>> frame : type(1) op(LOOKUP)
>>>>>> frame : type(0) op(0)
>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>> signal received: 6
>>>>>> time of crash:
>>>>>> 2019-02-01 23:22:03
>>>>>> configuration details:
>>>>>> argp 1
>>>>>> backtrace 1
>>>>>> dlfcn 1
>>>>>> libpthread 1
>>>>>> llistxattr 1
>>>>>> setfsid 1
>>>>>> spinlock 1
>>>>>> epoll.h 1
>>>>>> xattr.h 1
>>>>>> st_atim.tv_nsec 1
>>>>>> package-string: glusterfs 5.3
>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> The first (and so far only) crash happened at 2am the next day after
>>>>>>> we upgraded, on only one of four servers and only to one of two mounts.
>>>>>>>
>>>>>>> I have no idea what caused it, but yeah, we do have a pretty busy
>>>>>>> site (apkmirror.com), and it caused a disruption for any uploads or
>>>>>>> downloads from that server until I woke up and fixed the mount.
>>>>>>>
>>>>>>> I wish I could be more helpful but all I have is that stack trace.
>>>>>>>
>>>>>>> I'm glad it's a blocker and will hopefully be resolved soon.
>>>>>>>
>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <
>>>>>>> atumball at redhat.com> wrote:
>>>>>>>
>>>>>>>> Hi Artem,
>>>>>>>>
>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as
>>>>>>>> a clone of other bugs where recent discussions happened), and marked it as
>>>>>>>> a blocker for glusterfs-5.4 release.
>>>>>>>>
>>>>>>>> We already have fixes for log flooding -
>>>>>>>> https://review.gluster.org/22128, and are the process of
>>>>>>>> identifying and fixing the issue seen with crash.
>>>>>>>>
>>>>>>>> Can you please tell if the crashes happened as soon as upgrade ? or
>>>>>>>> was there any particular pattern you observed before the crash.
>>>>>>>>
>>>>>>>> -Amar
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <
>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I
>>>>>>>>> already got a crash which others have mentioned in
>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to
>>>>>>>>> unmount, kill gluster, and remount:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between
>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
>>>>>>>>> [2019-01-31 09:38:04.696993]
>>>>>>>>> pending frames:
>>>>>>>>> frame : type(1) op(READ)
>>>>>>>>> frame : type(1) op(OPEN)
>>>>>>>>> frame : type(0) op(0)
>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>>> signal received: 6
>>>>>>>>> time of crash:
>>>>>>>>> 2019-01-31 09:38:04
>>>>>>>>> configuration details:
>>>>>>>>> argp 1
>>>>>>>>> backtrace 1
>>>>>>>>> dlfcn 1
>>>>>>>>> libpthread 1
>>>>>>>>> llistxattr 1
>>>>>>>>> setfsid 1
>>>>>>>>> spinlock 1
>>>>>>>>> epoll.h 1
>>>>>>>>> xattr.h 1
>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>> ---------
>>>>>>>>>
>>>>>>>>> Do the pending patches fix the crash or only the repeated
>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via
>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>> not too sure how to make it core dump.
>>>>>>>>>
>>>>>>>>> If it's not fixed by the patches above, has anyone already opened
>>>>>>>>> a ticket for the crashes that I can join and monitor? This is going to
>>>>>>>>> create a massive problem for us since production systems are crashing.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Artem
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <
>>>>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <
>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Also, not sure if related or not, but I got a ton of these
>>>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have been
>>>>>>>>>>> commenting about this issue here
>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ==> mnt-SITE_data1.log <==
>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>> ==> mnt-SITE_data3.log <==
>>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>>> [2019-01-30 20:38:20.015593]
>>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>> ==> mnt-SITE_data1.log <==
>>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>>> [2019-01-30 20:38:20.546355]
>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>> selecting local read_child SITE_data1-client-0
>>>>>>>>>>>> ==> mnt-SITE_data3.log <==
>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>> selecting local read_child SITE_data3-client-0
>>>>>>>>>>>> ==> mnt-SITE_data1.log <==
>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>> handler
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm hoping raising the issue here on the mailing list may bring
>>>>>>>>>>> some additional eyeballs and get them both fixed.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> Sincerely,
>>>>>>>>>>> Artem
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <
>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I found a similar issue here:
>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a
>>>>>>>>>>>> comment from 3 days ago from someone else with 5.3 who started seeing the
>>>>>>>>>>>> spam.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the command that repeats over and over:
>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check why this
>>>>>>>>>> message is logged and send a fix?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> Is there any fix for this issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>> Artem
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Amar Tumballi (amarts)
>>>>>>>>
>>>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190208/78ac2ce3/attachment.html>