[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Wed Feb 20 05:36:34 UTC 2019

Hi Nithya,

Unfortunately, I just had another crash on the same server, with
performance.write-behind still set to off. I'll email the core file
privately.

[2019-02-19 19:50:39.511743] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7f9598991329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7f9598ba2af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7f95a137d218] ) 2-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
handler" repeated 95 times between [2019-02-19 19:49:07.655620] and
[2019-02-19 19:50:39.499284]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
2-<SNIP>_data3-replicate-0: selecting local read_child
<SNIP>_data3-client-3" repeated 56 times between [2019-02-19
19:49:07.602370] and [2019-02-19 19:50:42.912766]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-19 19:50:43
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f95a138864c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f95a1392cb6]
/lib64/libc.so.6(+0x36160)[0x7f95a054f160]
/lib64/libc.so.6(gsignal+0x110)[0x7f95a054f0e0]
/lib64/libc.so.6(abort+0x151)[0x7f95a05506c1]
/lib64/libc.so.6(+0x2e6fa)[0x7f95a05476fa]
/lib64/libc.so.6(+0x2e772)[0x7f95a0547772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f95a08dd0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f95994f0c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f9599503ba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f9599788f3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f95a1153820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f95a1153b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f95a1150063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f959aea00b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f95a13e64c3]
/lib64/libpthread.so.0(+0x7559)[0x7f95a08da559]
/lib64/libc.so.6(clone+0x3f)[0x7f95a061181f]
---------
[2019-02-19 19:51:34.425106] I [MSGID: 100030] [glusterfsd.c:2715:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
(args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
--volfile-server=localhost --volfile-id=/<SNIP>_data3 /mnt/<SNIP>_data3)
[2019-02-19 19:51:34.435206] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2019-02-19 19:51:34.450272] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2019-02-19 19:51:34.450394] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 4
[2019-02-19 19:51:34.450488] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 3

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>

On Tue, Feb 12, 2019 at 12:38 AM Nithya Balachandran <nbalacha at redhat.com>
wrote:

>
> Not yet but we are discussing an interim release. It is going to take a
> couple of days to review the fixes so not before then. We will update on
> the list with dates once we decide.
>
>
> On Tue, 12 Feb 2019 at 11:46, Artem Russakovskii <archon810 at gmail.com>
> wrote:
>
>> Awesome. But is there a release schedule and an ETA for when these will
>> be out in the repos?
>>
>> On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa <rgowdapp at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii <archon810 at gmail.com>
>>> wrote:
>>>
>>>> Great job identifying the issue!
>>>>
>>>> Any ETA on the next release with the logging and crash fixes in it?
>>>>
>>>
>>> I've marked write-behind corruption as a blocker for release-6. Logging
>>> fixes are already in codebase.
>>>
>>>
>>>> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa <rgowdapp at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 11, 2019 at 3:49 PM João Baúto <
>>>>> joao.bauto at neuro.fchampalimaud.org> wrote:
>>>>>
>>>>>> Although I don't have these error messages, I'm having fuse crashes
>>>>>> as frequent as you. I have disabled write-behind and the mount has been
>>>>>> running over the weekend with heavy usage and no issues.
>>>>>>
>>>>>
>>>>> The issue you are facing will likely be fixed by patch [1]. Me, Xavi
>>>>> and Nithya were able to identify the corruption in write-behind.
>>>>>
>>>>> [1] https://review.gluster.org/22189
>>>>>
>>>>>
>>>>>> I can provide coredumps before disabling write-behind if needed. I
>>>>>> opened a BZ report
>>>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1671014> with the
>>>>>> crashes that I was having.
>>>>>>
>>>>>> *João Baúto*
>>>>>> ---------------
>>>>>>
>>>>>> *Scientific Computing and Software Platform*
>>>>>> Champalimaud Research
>>>>>> Champalimaud Center for the Unknown
>>>>>> Av. Brasília, Doca de Pedrouços
>>>>>> 1400-038 Lisbon, Portugal
>>>>>> fchampalimaud.org <https://www.fchampalimaud.org/>
>>>>>>
>>>>>>
>>>>>> Artem Russakovskii <archon810 at gmail.com> escreveu no dia sábado,
>>>>>> 9/02/2019 à(s) 22:18:
>>>>>>
>>>>>>> Alright. I've enabled core-dumping (hopefully), so now I'm waiting
>>>>>>> for the next crash to see if it dumps a core for you guys to remotely debug.
>>>>>>>
>>>>>>> Then I can consider setting performance.write-behind to off and
>>>>>>> monitoring for further crashes.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <
>>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <
>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Nithya,
>>>>>>>>>
>>>>>>>>> I can try to disable write-behind as long as it doesn't heavily
>>>>>>>>> impact performance for us. Which option is it exactly? I don't see it set
>>>>>>>>> in my list of changed volume variables that I sent you guys earlier.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The option is performance.write-behind
>>>>>>>>
>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Artem
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <
>>>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Artem,
>>>>>>>>>>
>>>>>>>>>> We have found the cause of one crash. Unfortunately we have not
>>>>>>>>>> managed to reproduce the one you reported so we don't know if it is the
>>>>>>>>>> same cause.
>>>>>>>>>>
>>>>>>>>>> Can you disable write-behind on the volume and let us know if it
>>>>>>>>>> solves the problem? If yes, it is likely to be the same issue.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> regards,
>>>>>>>>>> Nithya
>>>>>>>>>>
>>>>>>>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <
>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sorry to disappoint, but the crash just happened again, so
>>>>>>>>>>> lru-limit=0 didn't help.
>>>>>>>>>>>
>>>>>>>>>>> Here's the snippet of the crash and the subsequent remount by
>>>>>>>>>>> monit.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7f4402b99329]
>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>>>>>>>>>> valid argument]
>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0:
>>>>>>>>>>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between
>>>>>>>>>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
>>>>>>>>>>> [2019-02-08 01:13:09.311554]
>>>>>>>>>>> pending frames:
>>>>>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>>>>> signal received: 6
>>>>>>>>>>> time of crash:
>>>>>>>>>>> 2019-02-08 01:13:09
>>>>>>>>>>> configuration details:
>>>>>>>>>>> argp 1
>>>>>>>>>>> backtrace 1
>>>>>>>>>>> dlfcn 1
>>>>>>>>>>> libpthread 1
>>>>>>>>>>> llistxattr 1
>>>>>>>>>>> setfsid 1
>>>>>>>>>>> spinlock 1
>>>>>>>>>>> epoll.h 1
>>>>>>>>>>> xattr.h 1
>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>>>>>>>>>>
>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>>>>>>>>>
>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>>>>>>>>>
>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>>>>>>>>>
>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>>>>>>>>>>
>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>>>>>>>>>>
>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>>>>>>>>>> ---------
>>>>>>>>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030]
>>>>>>>>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running
>>>>>>>>>>> /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0
>>>>>>>>>>> --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1
>>>>>>>>>>> /mnt/<SNIP>_data1)
>>>>>>>>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>>>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>>>> with index 1
>>>>>>>>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>>>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>>>> with index 2
>>>>>>>>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>>>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>>>> with index 3
>>>>>>>>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>>>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>>>> with index 4
>>>>>>>>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020]
>>>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-0: parent translators are
>>>>>>>>>>> ready, attempting connect on transport
>>>>>>>>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020]
>>>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-1: parent translators are
>>>>>>>>>>> ready, attempting connect on transport
>>>>>>>>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020]
>>>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-2: parent translators are
>>>>>>>>>>> ready, attempting connect on transport
>>>>>>>>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020]
>>>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-3: parent translators are
>>>>>>>>>>> ready, attempting connect on transport
>>>>>>>>>>> [2019-02-08 01:13:35.655527] I
>>>>>>>>>>> [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-<SNIP>_data1-client-0: changing port
>>>>>>>>>>> to 49153 (from 0)
>>>>>>>>>>> Final graph:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sincerely,
>>>>>>>>>>> Artem
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <
>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I've added the lru-limit=0 parameter to the mounts, and I see
>>>>>>>>>>>> it's taken effect correctly:
>>>>>>>>>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>>>>>>>>>>>> --volfile-server=localhost --volfile-id=/<SNIP>  /mnt/<SNIP>"
>>>>>>>>>>>>
>>>>>>>>>>>> Let's see if it stops crashing or not.
>>>>>>>>>>>>
>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>> Artem
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <
>>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Nithya,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started
>>>>>>>>>>>>> seeing crashes, and no further releases have been made yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> volume info:
>>>>>>>>>>>>> Type: Replicate
>>>>>>>>>>>>> Volume ID: ****SNIP****
>>>>>>>>>>>>> Status: Started
>>>>>>>>>>>>> Snapshot Count: 0
>>>>>>>>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>> Brick1: ****SNIP****
>>>>>>>>>>>>> Brick2: ****SNIP****
>>>>>>>>>>>>> Brick3: ****SNIP****
>>>>>>>>>>>>> Brick4: ****SNIP****
>>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>>> cluster.quorum-count: 1
>>>>>>>>>>>>> cluster.quorum-type: fixed
>>>>>>>>>>>>> network.ping-timeout: 5
>>>>>>>>>>>>> network.remote-dio: enable
>>>>>>>>>>>>> performance.rda-cache-limit: 256MB
>>>>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>>>> performance.parallel-readdir: on
>>>>>>>>>>>>> network.inode-lru-limit: 500000
>>>>>>>>>>>>> performance.md-cache-timeout: 600
>>>>>>>>>>>>> performance.cache-invalidation: on
>>>>>>>>>>>>> performance.stat-prefetch: on
>>>>>>>>>>>>> features.cache-invalidation-timeout: 600
>>>>>>>>>>>>> features.cache-invalidation: on
>>>>>>>>>>>>> cluster.readdir-optimize: on
>>>>>>>>>>>>> performance.io-thread-count: 32
>>>>>>>>>>>>> server.event-threads: 4
>>>>>>>>>>>>> client.event-threads: 4
>>>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>>>> cluster.lookup-optimize: on
>>>>>>>>>>>>> performance.cache-size: 1GB
>>>>>>>>>>>>> cluster.self-heal-daemon: enable
>>>>>>>>>>>>> transport.address-family: inet
>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>> performance.client-io-threads: on
>>>>>>>>>>>>> cluster.granular-entry-heal: enable
>>>>>>>>>>>>> cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <
>>>>>>>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you still see the crashes with 5.3? If yes, please try
>>>>>>>>>>>>>> mount the volume using the mount option lru-limit=0 and see if that helps.
>>>>>>>>>>>>>> We are looking into the crashes and will update when have a fix.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, please provide the gluster volume info for the volume
>>>>>>>>>>>>>> in question.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> regards,
>>>>>>>>>>>>>> Nithya
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <
>>>>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The fuse crash happened two more times, but this time monit
>>>>>>>>>>>>>>> helped recover within 1 minute, so it's a great workaround for now.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What's odd is that the crashes are only happening on one of
>>>>>>>>>>>>>>> 4 servers, and I don't know why.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <
>>>>>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The fuse crash happened again yesterday, to another volume.
>>>>>>>>>>>>>>>> Are there any mount options that could help mitigate this?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In the meantime, I set up a monit (
>>>>>>>>>>>>>>>> https://mmonit.com/monit/) task to watch and restart the
>>>>>>>>>>>>>>>> mount, which works and recovers the mount point within a minute. Not ideal,
>>>>>>>>>>>>>>>> but a temporary workaround.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> By the way, the way to reproduce this "Transport endpoint
>>>>>>>>>>>>>>>> is not connected" condition for testing purposes is to kill -9 the right
>>>>>>>>>>>>>>>> "glusterfs --process-name fuse" process.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> monit check:
>>>>>>>>>>>>>>>> check filesystem glusterfs_data1 with path
>>>>>>>>>>>>>>>> /mnt/glusterfs_data1
>>>>>>>>>>>>>>>>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>>>>>>>>>>>>>>>>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>>>>>>>>>>>>>>>>   if space usage > 90% for 5 times within 15 cycles
>>>>>>>>>>>>>>>>     then alert else if succeeded for 10 cycles then alert
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> stack trace:
>>>>>>>>>>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>>>>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
>>>>>>>>>>>>>>>> [2019-02-01 23:21:56.164427]
>>>>>>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between
>>>>>>>>>>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
>>>>>>>>>>>>>>>> pending frames:
>>>>>>>>>>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>>>>>>>>>> signal received: 6
>>>>>>>>>>>>>>>> time of crash:
>>>>>>>>>>>>>>>> 2019-02-01 23:22:03
>>>>>>>>>>>>>>>> configuration details:
>>>>>>>>>>>>>>>> argp 1
>>>>>>>>>>>>>>>> backtrace 1
>>>>>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>>>>>> libpthread 1
>>>>>>>>>>>>>>>> llistxattr 1
>>>>>>>>>>>>>>>> setfsid 1
>>>>>>>>>>>>>>>> spinlock 1
>>>>>>>>>>>>>>>> epoll.h 1
>>>>>>>>>>>>>>>> xattr.h 1
>>>>>>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <
>>>>>>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The first (and so far only) crash happened at 2am the next
>>>>>>>>>>>>>>>>> day after we upgraded, on only one of four servers and only to one of two
>>>>>>>>>>>>>>>>> mounts.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have no idea what caused it, but yeah, we do have a
>>>>>>>>>>>>>>>>> pretty busy site (apkmirror.com), and it caused a
>>>>>>>>>>>>>>>>> disruption for any uploads or downloads from that server until I woke up
>>>>>>>>>>>>>>>>> and fixed the mount.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I wish I could be more helpful but all I have is that
>>>>>>>>>>>>>>>>> stack trace.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm glad it's a blocker and will hopefully be resolved
>>>>>>>>>>>>>>>>> soon.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <
>>>>>>>>>>>>>>>>> atumball at redhat.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Opened
>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie,
>>>>>>>>>>>>>>>>>> as a clone of other bugs where recent discussions happened), and marked it
>>>>>>>>>>>>>>>>>> as a blocker for glusterfs-5.4 release.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We already have fixes for log flooding -
>>>>>>>>>>>>>>>>>> https://review.gluster.org/22128, and are the process of
>>>>>>>>>>>>>>>>>> identifying and fixing the issue seen with crash.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Can you please tell if the crashes happened as soon as
>>>>>>>>>>>>>>>>>> upgrade ? or was there any particular pattern you observed before the crash.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Amar
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <
>>>>>>>>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to
>>>>>>>>>>>>>>>>>>> 5.3, I already got a crash which others have mentioned in
>>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and
>>>>>>>>>>>>>>>>>>> had to unmount, kill gluster, and remount:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between
>>>>>>>>>>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
>>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
>>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.696993]
>>>>>>>>>>>>>>>>>>> pending frames:
>>>>>>>>>>>>>>>>>>> frame : type(1) op(READ)
>>>>>>>>>>>>>>>>>>> frame : type(1) op(OPEN)
>>>>>>>>>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>>>>>>>>>>>>> signal received: 6
>>>>>>>>>>>>>>>>>>> time of crash:
>>>>>>>>>>>>>>>>>>> 2019-01-31 09:38:04
>>>>>>>>>>>>>>>>>>> configuration details:
>>>>>>>>>>>>>>>>>>> argp 1
>>>>>>>>>>>>>>>>>>> backtrace 1
>>>>>>>>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>>>>>>>>> libpthread 1
>>>>>>>>>>>>>>>>>>> llistxattr 1
>>>>>>>>>>>>>>>>>>> setfsid 1
>>>>>>>>>>>>>>>>>>> spinlock 1
>>>>>>>>>>>>>>>>>>> epoll.h 1
>>>>>>>>>>>>>>>>>>> xattr.h 1
>>>>>>>>>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Do the pending patches fix the crash or only the
>>>>>>>>>>>>>>>>>>> repeated warnings? I'm running glusterfs on OpenSUSE 15.0 installed via
>>>>>>>>>>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>>>>>>>>>>>> not too sure how to make it core dump.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If it's not fixed by the patches above, has anyone
>>>>>>>>>>>>>>>>>>> already opened a ticket for the crashes that I can join and monitor? This
>>>>>>>>>>>>>>>>>>> is going to create a massive problem for us since production systems are
>>>>>>>>>>>>>>>>>>> crashing.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <
>>>>>>>>>>>>>>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <
>>>>>>>>>>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of
>>>>>>>>>>>>>>>>>>>>> these "Failed to dispatch handler" in my logs as well. Many people have
>>>>>>>>>>>>>>>>>>>>> been commenting about this issue here
>>>>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/
>>>>>>>>>>>>>>>>>>>> addresses this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <==
>>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <==
>>>>>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.015593]
>>>>>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <==
>>>>>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.546355]
>>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0
>>>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <==
>>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0
>>>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <==
>>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list
>>>>>>>>>>>>>>>>>>>>> may bring some additional eyeballs and get them both fixed.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>
>>>>>>>>>>>>>>>>>>>>> , APK Mirror <http://www.apkmirror.com/>, Illogical
>>>>>>>>>>>>>>>>>>>>> Robot LLC
>>>>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <
>>>>>>>>>>>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I found a similar issue here:
>>>>>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567.
>>>>>>>>>>>>>>>>>>>>>> There's a comment from 3 days ago from someone else with 5.3 who started
>>>>>>>>>>>>>>>>>>>>>> seeing the spam.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Here's the command that repeats over and over:
>>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check
>>>>>>>>>>>>>>>>>>>> why this message is logged and send a fix?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Is there any fix for this issue?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Founder, Android Police
>>>>>>>>>>>>>>>>>>>>>> <http://www.androidpolice.com>, APK Mirror
>>>>>>>>>>>>>>>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> |
>>>>>>>>>>>>>>>>>>>>>> @ArtemR <http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Amar Tumballi (amarts)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190219/a85b4641/attachment.html>