<div dir="ltr"><div dir="ltr"><div>One possible reason could be <a href="https://review.gluster.org/r/18b6d7ce7d490e807815270918a17a4b392a829d">https://review.gluster.org/r/18b6d7ce7d490e807815270918a17a4b392a829d</a> as that changed some code in epoll handler. Though the change is largely on server side, the epoll and socket changes are relevant for client too. I'll try to see whether there is anything wrong with that.<br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 8, 2019 at 8:36 AM Nithya Balachandran <<a href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks Artem. Can you send us the coredump or the bt with symbols from the crash?<div><br></div><div>Regards,</div><div>Nithya</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Sorry to disappoint, but the crash just happened again, so lru-limit=0 didn't help.<div><br></div><div>Here's the snippet of the crash and the subsequent remount by monit.</div><div><br></div><div><br></div><div><div>[2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7f4402b99329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7f440b6b5218] ) 0-dict: dict is NULL [In</div><div>valid argument]</div><div>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: selecting local read_child <SNIP>_data1-client-3" repeated 39 times between [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]</div><div>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 515 times between [2019-02-08 01:11:17.932515] and [2019-02-08 01:13:09.311554]</div><div>pending frames:</div><div>frame : type(1) op(LOOKUP)</div><div>frame : type(0) op(0)</div><div>patchset: git://<a href="http://git.gluster.org/glusterfs.git" target="_blank">git.gluster.org/glusterfs.git</a></div><div>signal received: 6</div><div>time of crash: </div><div>2019-02-08 01:13:09</div><div>configuration details:</div><div>argp 1</div><div>backtrace 1</div><div>dlfcn 1</div><div>libpthread 1</div><div>llistxattr 1</div><div>setfsid 1</div><div>spinlock 1</div><div>epoll.h 1</div><div>xattr.h 1</div><div>st_atim.tv_nsec 1</div><div>package-string: glusterfs 5.3</div><div>/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]</div><div>/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]</div><div>/lib64/libc.so.6(+0x36160)[0x7f440a887160]</div><div>/lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]</div><div>/lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]</div><div>/lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]</div><div>/lib64/libc.so.6(+0x2e772)[0x7f440a87f772]</div><div>/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]</div><div>/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]</div><div>/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]</div><div>/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]</div><div>/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]</div><div>/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]</div><div>/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]</div><div>/lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]</div><div>/lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]</div><div>---------</div><div>[2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1)</div><div>[2019-02-08 01:13:35.637830] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1</div><div>[2019-02-08 01:13:35.651405] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2</div><div>[2019-02-08 01:13:35.651628] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3</div><div>[2019-02-08 01:13:35.651747] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4</div><div>[2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect on transport</div><div>[2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect on transport</div><div>[2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect on transport</div><div>[2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect on transport</div><div>[2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-<SNIP>_data1-client-0: changing port to 49153 (from 0)</div><div>Final graph:</div></div><div><br clear="all"><div><div dir="ltr" class="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">I've added the lru-limit=0 parameter to the mounts, and I see it's taken effect correctly:</div><div dir="ltr">"/usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>"<br clear="all"><div><div dir="ltr" class="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail-m_6462230140181828570gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><div>Let's see if it stops crashing or not.</div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hi Nithya,</div><div dir="ltr"><br></div><div dir="ltr">Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing crashes, and no further releases have been made yet.<div><br></div><div>volume info:</div><div><div>Type: Replicate</div><div>Volume ID: ****SNIP****</div><div>Status: Started</div><div>Snapshot Count: 0</div><div>Number of Bricks: 1 x 4 = 4</div><div>Transport-type: tcp</div><div>Bricks:</div><div>Brick1: ****SNIP****</div><div>Brick2: ****SNIP****</div><div>Brick3: ****SNIP****</div><div>Brick4: ****SNIP****</div><div>Options Reconfigured:</div><div>cluster.quorum-count: 1</div><div>cluster.quorum-type: fixed</div><div>network.ping-timeout: 5</div><div>network.remote-dio: enable</div><div>performance.rda-cache-limit: 256MB</div><div>performance.readdir-ahead: on</div><div>performance.parallel-readdir: on</div><div>network.inode-lru-limit: 500000</div><div>performance.md-cache-timeout: 600</div><div>performance.cache-invalidation: on</div><div>performance.stat-prefetch: on</div><div>features.cache-invalidation-timeout: 600</div><div>features.cache-invalidation: on</div><div>cluster.readdir-optimize: on</div><div>performance.io-thread-count: 32</div><div>server.event-threads: 4</div><div>client.event-threads: 4</div><div>performance.read-ahead: off</div><div>cluster.lookup-optimize: on</div><div>performance.cache-size: 1GB</div><div>cluster.self-heal-daemon: enable</div><div>transport.address-family: inet</div><div>nfs.disable: on</div><div>performance.client-io-threads: on</div><div>cluster.granular-entry-heal: enable</div><div>cluster.data-self-heal-algorithm: full</div><div><div dir="ltr" class="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <<a href="mailto:nbalacha@redhat.com" target="_blank">nbalacha@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Artem,<div><br></div><div>Do you still see the crashes with 5.3? If yes, please try mount the volume using the mount option lru-limit=0 and see if that helps. We are looking into the crashes and will update when have a fix.</div><div><br></div><div>Also, please provide the gluster volume info for the volume in question.</div><div><br></div><div><br></div><div>regards,</div><div>Nithya</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">The fuse crash happened two more times, but this time monit helped recover within 1 minute, so it's a great workaround for now.<div><br></div><div>What's odd is that the crashes are only happening on one of 4 servers, and I don't know why.<br clear="all"><div><div dir="ltr" class="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this?<div><br></div><div>In the meantime, I set up a monit (<a href="https://mmonit.com/monit/" target="_blank">https://mmonit.com/monit/</a>) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround.</div><div><br></div><div>By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process.</div><div><br></div><div><br></div><div>monit check:</div><div><div>check filesystem glusterfs_data1 with path /mnt/glusterfs_data1</div><div> start program = "/bin/mount
/mnt/glusterfs_data1"</div><div> stop program = "/bin/umount /mnt/glusterfs_data1"</div><div> if space usage > 90% for 5 times within 15 cycles</div><div> then alert else if succeeded for 10 cycles then alert</div></div><div><br></div><div><br></div><div>stack trace:</div><div><div>[2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]</div><div>[2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]</div><div>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427]</div><div>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]</div><div>pending frames:</div><div>frame : type(1) op(LOOKUP)</div><div>frame : type(0) op(0)</div><div>patchset: git://<a href="http://git.gluster.org/glusterfs.git" target="_blank">git.gluster.org/glusterfs.git</a></div><div>signal received: 6</div><div>time of crash:</div><div>2019-02-01 23:22:03</div><div>configuration details:</div><div>argp 1</div><div>backtrace 1</div><div>dlfcn 1</div><div>libpthread 1</div><div>llistxattr 1</div><div>setfsid 1</div><div>spinlock 1</div><div>epoll.h 1</div><div>xattr.h 1</div><div>st_atim.tv_nsec 1</div><div>package-string: glusterfs 5.3</div><div>/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]</div><div>/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]</div><div>/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]</div><div>/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]</div><div>/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]</div><div>/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]</div><div>/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]</div><div>/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]</div><div>/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]</div><div>/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]</div><div>/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]</div><div>/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]</div><div>/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]</div><div>/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]</div><div>/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]</div><div>/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]</div><div><div dir="ltr" class="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">Hi,<div dir="auto"><br></div><div dir="auto">The first (and so far only) crash happened at 2am the next day after we upgraded, on only one of four servers and only to one of two mounts.</div><div dir="auto"><br></div><div dir="auto">I have no idea what caused it, but yeah, we do have a pretty busy site (<a href="http://apkmirror.com" target="_blank">apkmirror.com</a>), and it caused a disruption for any uploads or downloads from that server until I woke up and fixed the mount.</div><div dir="auto"><br></div><div dir="auto">I wish I could be more helpful but all I have is that stack trace. </div><div dir="auto"><br></div><div dir="auto">I'm glad it's a blocker and will hopefully be resolved soon. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <<a href="mailto:atumball@redhat.com" target="_blank">atumball@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hi Artem,<div><br></div><div>Opened <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1671603" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1671603</a> (ie, as a clone of other bugs where recent discussions happened), and marked it as a blocker for glusterfs-5.4 release.</div><div><br></div><div>We already have fixes for log flooding - <a href="https://review.gluster.org/22128" rel="noreferrer" target="_blank">https://review.gluster.org/22128</a>, and are the process of identifying and fixing the issue seen with crash.</div><div><br></div><div>Can you please tell if the crashes happened as soon as upgrade ? or was there any particular pattern you observed before the crash.</div><div><br></div><div>-Amar</div><div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <<a href="mailto:archon810@gmail.com" rel="noreferrer" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Within 24 hours after updating from rock solid 4.1 to 5.3, I already got a crash which others have mentioned in <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1313567" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1313567</a> and had to unmount, kill gluster, and remount:</div><div><br></div><div><br></div><div>[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]</div><div>[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]</div><div>[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]</div><div>[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]</div><div>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3" repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]</div><div>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 72 times between [2019-01-31 09:37:53.746741] and [2019-01-31 09:38:04.696993]</div><div>pending frames:</div><div>frame : type(1) op(READ)</div><div>frame : type(1) op(OPEN)</div><div>frame : type(0) op(0)</div><div>patchset: git://<a href="http://git.gluster.org/glusterfs.git" rel="noreferrer" target="_blank">git.gluster.org/glusterfs.git</a></div><div>signal received: 6</div><div>time of crash:</div><div>2019-01-31 09:38:04</div><div>configuration details:</div><div>argp 1</div><div>backtrace 1</div><div>dlfcn 1</div><div>libpthread 1</div><div>llistxattr 1</div><div>setfsid 1</div><div>spinlock 1</div><div>epoll.h 1</div><div>xattr.h 1</div><div>st_atim.tv_nsec 1</div><div>package-string: glusterfs 5.3</div><div>/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]</div><div>/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]</div><div>/lib64/libc.so.6(+0x36160)[0x7fccd622d160]</div><div>/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]</div><div>/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]</div><div>/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]</div><div>/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]</div><div>/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]</div><div>/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]</div><div>/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]</div><div>/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]</div><div>/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]</div><div>/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]</div><div>/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]</div><div>/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]</div><div>/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]</div><div>---------</div><div><br></div><div>Do the pending patches fix the crash or only the repeated warnings? I'm running glusterfs on OpenSUSE 15.0 installed via <a href="http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/" rel="noreferrer" target="_blank">http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/</a>, not too sure how to make it core dump.</div><div><br></div><div>If it's not fixed by the patches above, has anyone already opened a ticket for the crashes that I can join and monitor? This is going to create a massive problem for us since production systems are crashing.</div><div><br></div><div>Thanks.</div><div><div dir="ltr" class="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail-m_4728589735375555382gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" rel="noreferrer" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" rel="noreferrer" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" rel="noreferrer" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <<a href="mailto:rgowdapp@redhat.com" rel="noreferrer" target="_blank">rgowdapp@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <<a href="mailto:archon810@gmail.com" rel="noreferrer" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Also, not sure if related or not, but I got a ton of these "Failed to dispatch handler" in my logs as well. Many people have been commenting about this issue here <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1651246" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1651246</a>.</div></div></div></div></blockquote><div><br></div><div><a href="https://review.gluster.org/#/c/glusterfs/+/22046/" rel="noreferrer" target="_blank">https://review.gluster.org/#/c/glusterfs/+/22046/</a> addresses this.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">==> mnt-SITE_data1.log <==<br>[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]<br>==> mnt-SITE_data3.log <==<br>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 413 times between [2019-01-30 20:36:23.881090] and [2019-01-30 20:38:20.015593]<br>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0" repeated 42 times between [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]<br>==> mnt-SITE_data1.log <==<br>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0" repeated 50 times between [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]<br>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and [2019-01-30 20:38:20.546355]<br>[2019-01-30 20:38:21.492319] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0<br>==> mnt-SITE_data3.log <==<br>[2019-01-30 20:38:22.349689] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0<br>==> mnt-SITE_data1.log <==<br>[2019-01-30 20:38:22.762941] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler </blockquote><div dir="ltr"><div><div dir="ltr" class="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail-m_4728589735375555382gmail-m_-832810018525896981gmail-m_-8072525330423685591gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><div dir="ltr">I'm hoping raising the issue here on the mailing list may bring some additional eyeballs and get them both fixed.<br></div><div><br></div><div>Thanks.</div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" rel="noreferrer" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" rel="noreferrer" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" rel="noreferrer" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div><div dir="ltr"><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <<a href="mailto:archon810@gmail.com" rel="noreferrer" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>I found a similar issue here: <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1313567" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1313567</a>. There's a comment from 3 days ago from someone else with 5.3 who started seeing the spam.</div><div><br></div><div>Here's the command that repeats over and over:</div><div>[2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]</div></div></div></div></blockquote></div></blockquote><div><br></div><div><a class="gmail_plusreply" id="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail-m_4728589735375555382gmail-m_-832810018525896981plusReplyChip-0" href="mailto:mchangir@redhat.com" rel="noreferrer" target="_blank">+Milind Changire</a> Can you check why this message is logged and send a fix?</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><br></div><div>Is there any fix for this issue?</div><div><br></div><div>Thanks.</div><div><div dir="ltr" class="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail-m_4728589735375555382gmail-m_-832810018525896981gmail-m_-8072525330423685591gmail-m_3626885015457760579gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" rel="noreferrer" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" rel="noreferrer" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" rel="noreferrer" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div></div></div></div>
</blockquote></div>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" rel="noreferrer" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div></div></div>
</blockquote></div>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" rel="noreferrer" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_3843923940090394456gmail-m_-3819221624247292369gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Amar Tumballi (amarts)<br></div></div></div></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div>