<div dir="ltr">Hi Artem,<div><br></div><div>We have found the cause of one crash. Unfortunately we have not managed to reproduce the one you reported so we don't know if it is the same cause.</div><div><br></div><div>Can you disable write-behind on the volume and let us know if it solves the problem? If yes, it is likely to be the same issue.</div><div><br></div><div><br></div><div>regards,</div><div>Nithya</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <<a href="mailto:archon810@gmail.com">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Sorry to disappoint, but the crash just happened again, so lru-limit=0 didn't help.<div><br></div><div>Here's the snippet of the crash and the subsequent remount by monit.</div><div><br></div><div><br></div><div><div>[2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7f4402b99329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7f440b6b5218] ) 0-dict: dict is NULL [In</div><div>valid argument]</div><div>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: selecting local read_child <SNIP>_data1-client-3" repeated 39 times between [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]</div><div>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 515 times between [2019-02-08 01:11:17.932515] and [2019-02-08 01:13:09.311554]</div><div>pending frames:</div><div>frame : type(1) op(LOOKUP)</div><div>frame : type(0) op(0)</div><div>patchset: git://<a href="http://git.gluster.org/glusterfs.git" target="_blank">git.gluster.org/glusterfs.git</a></div><div>signal received: 6</div><div>time of crash: </div><div>2019-02-08 01:13:09</div><div>configuration details:</div><div>argp 1</div><div>backtrace 1</div><div>dlfcn 1</div><div>libpthread 1</div><div>llistxattr 1</div><div>setfsid 1</div><div>spinlock 1</div><div>epoll.h 1</div><div>xattr.h 1</div><div>st_atim.tv_nsec 1</div><div>package-string: glusterfs 5.3</div><div>/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]</div><div>/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]</div><div>/lib64/libc.so.6(+0x36160)[0x7f440a887160]</div><div>/lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]</div><div>/lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]</div><div>/lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]</div><div>/lib64/libc.so.6(+0x2e772)[0x7f440a87f772]</div><div>/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]</div><div>/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]</div><div>/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]</div><div>/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]</div><div>/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]</div><div>/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]</div><div>/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]</div><div>/lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]</div><div>/lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]</div><div>---------</div><div>[2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1)</div><div>[2019-02-08 01:13:35.637830] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1</div><div>[2019-02-08 01:13:35.651405] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2</div><div>[2019-02-08 01:13:35.651628] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3</div><div>[2019-02-08 01:13:35.651747] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4</div><div>[2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect on transport</div><div>[2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect on transport</div><div>[2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect on transport</div><div>[2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect on transport</div><div>[2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-<SNIP>_data1-client-0: changing port to 49153 (from 0)</div><div>Final graph:</div></div><div><br clear="all"><div><div dir="ltr" class="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">I've added the lru-limit=0 parameter to the mounts, and I see it's taken effect correctly:</div><div dir="ltr">"/usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>"<br clear="all"><div><div dir="ltr" class="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail-m_6462230140181828570gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><div>Let's see if it stops crashing or not.</div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hi Nithya,</div><div dir="ltr"><br></div><div dir="ltr">Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing crashes, and no further releases have been made yet.<div><br></div><div>volume info:</div><div><div>Type: Replicate</div><div>Volume ID: ****SNIP****</div><div>Status: Started</div><div>Snapshot Count: 0</div><div>Number of Bricks: 1 x 4 = 4</div><div>Transport-type: tcp</div><div>Bricks:</div><div>Brick1: ****SNIP****</div><div>Brick2: ****SNIP****</div><div>Brick3: ****SNIP****</div><div>Brick4: ****SNIP****</div><div>Options Reconfigured:</div><div>cluster.quorum-count: 1</div><div>cluster.quorum-type: fixed</div><div>network.ping-timeout: 5</div><div>network.remote-dio: enable</div><div>performance.rda-cache-limit: 256MB</div><div>performance.readdir-ahead: on</div><div>performance.parallel-readdir: on</div><div>network.inode-lru-limit: 500000</div><div>performance.md-cache-timeout: 600</div><div>performance.cache-invalidation: on</div><div>performance.stat-prefetch: on</div><div>features.cache-invalidation-timeout: 600</div><div>features.cache-invalidation: on</div><div>cluster.readdir-optimize: on</div><div>performance.io-thread-count: 32</div><div>server.event-threads: 4</div><div>client.event-threads: 4</div><div>performance.read-ahead: off</div><div>cluster.lookup-optimize: on</div><div>performance.cache-size: 1GB</div><div>cluster.self-heal-daemon: enable</div><div>transport.address-family: inet</div><div>nfs.disable: on</div><div>performance.client-io-threads: on</div><div>cluster.granular-entry-heal: enable</div><div>cluster.data-self-heal-algorithm: full</div><div><div dir="ltr" class="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <<a href="mailto:nbalacha@redhat.com" target="_blank">nbalacha@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Artem,<div><br></div><div>Do you still see the crashes with 5.3? If yes, please try mount the volume using the mount option lru-limit=0 and see if that helps. We are looking into the crashes and will update when have a fix.</div><div><br></div><div>Also, please provide the gluster volume info for the volume in question.</div><div><br></div><div><br></div><div>regards,</div><div>Nithya</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">The fuse crash happened two more times, but this time monit helped recover within 1 minute, so it's a great workaround for now.<div><br></div><div>What's odd is that the crashes are only happening on one of 4 servers, and I don't know why.<br clear="all"><div><div dir="ltr" class="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this?<div><br></div><div>In the meantime, I set up a monit (<a href="https://mmonit.com/monit/" target="_blank">https://mmonit.com/monit/</a>) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround.</div><div><br></div><div>By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process.</div><div><br></div><div><br></div><div>monit check:</div><div><div>check filesystem glusterfs_data1 with path /mnt/glusterfs_data1</div><div> start program = "/bin/mount
/mnt/glusterfs_data1"</div><div> stop program = "/bin/umount /mnt/glusterfs_data1"</div><div> if space usage > 90% for 5 times within 15 cycles</div><div> then alert else if succeeded for 10 cycles then alert</div></div><div><br></div><div><br></div><div>stack trace:</div><div><div>[2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]</div><div>[2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]</div><div>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427]</div><div>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]</div><div>pending frames:</div><div>frame : type(1) op(LOOKUP)</div><div>frame : type(0) op(0)</div><div>patchset: git://<a href="http://git.gluster.org/glusterfs.git" target="_blank">git.gluster.org/glusterfs.git</a></div><div>signal received: 6</div><div>time of crash:</div><div>2019-02-01 23:22:03</div><div>configuration details:</div><div>argp 1</div><div>backtrace 1</div><div>dlfcn 1</div><div>libpthread 1</div><div>llistxattr 1</div><div>setfsid 1</div><div>spinlock 1</div><div>epoll.h 1</div><div>xattr.h 1</div><div>st_atim.tv_nsec 1</div><div>package-string: glusterfs 5.3</div><div>/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]</div><div>/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]</div><div>/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]</div><div>/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]</div><div>/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]</div><div>/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]</div><div>/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]</div><div>/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]</div><div>/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]</div><div>/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]</div><div>/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]</div><div>/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]</div><div>/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]</div><div>/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]</div><div>/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]</div><div>/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]</div><div><div dir="ltr" class="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">Hi,<div dir="auto"><br></div><div dir="auto">The first (and so far only) crash happened at 2am the next day after we upgraded, on only one of four servers and only to one of two mounts.</div><div dir="auto"><br></div><div dir="auto">I have no idea what caused it, but yeah, we do have a pretty busy site (<a href="http://apkmirror.com" target="_blank">apkmirror.com</a>), and it caused a disruption for any uploads or downloads from that server until I woke up and fixed the mount.</div><div dir="auto"><br></div><div dir="auto">I wish I could be more helpful but all I have is that stack trace. </div><div dir="auto"><br></div><div dir="auto">I'm glad it's a blocker and will hopefully be resolved soon. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <<a href="mailto:atumball@redhat.com" target="_blank">atumball@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hi Artem,<div><br></div><div>Opened <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1671603" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1671603</a> (ie, as a clone of other bugs where recent discussions happened), and marked it as a blocker for glusterfs-5.4 release.</div><div><br></div><div>We already have fixes for log flooding - <a href="https://review.gluster.org/22128" rel="noreferrer" target="_blank">https://review.gluster.org/22128</a>, and are the process of identifying and fixing the issue seen with crash.</div><div><br></div><div>Can you please tell if the crashes happened as soon as upgrade ? or was there any particular pattern you observed before the crash.</div><div><br></div><div>-Amar</div><div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <<a href="mailto:archon810@gmail.com" rel="noreferrer" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Within 24 hours after updating from rock solid 4.1 to 5.3, I already got a crash which others have mentioned in <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1313567" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1313567</a> and had to unmount, kill gluster, and remount:</div><div><br></div><div><br></div><div>[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]</div><div>[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]</div><div>[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]</div><div>[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]</div><div>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3" repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]</div><div>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 72 times between [2019-01-31 09:37:53.746741] and [2019-01-31 09:38:04.696993]</div><div>pending frames:</div><div>frame : type(1) op(READ)</div><div>frame : type(1) op(OPEN)</div><div>frame : type(0) op(0)</div><div>patchset: git://<a href="http://git.gluster.org/glusterfs.git" rel="noreferrer" target="_blank">git.gluster.org/glusterfs.git</a></div><div>signal received: 6</div><div>time of crash:</div><div>2019-01-31 09:38:04</div><div>configuration details:</div><div>argp 1</div><div>backtrace 1</div><div>dlfcn 1</div><div>libpthread 1</div><div>llistxattr 1</div><div>setfsid 1</div><div>spinlock 1</div><div>epoll.h 1</div><div>xattr.h 1</div><div>st_atim.tv_nsec 1</div><div>package-string: glusterfs 5.3</div><div>/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]</div><div>/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]</div><div>/lib64/libc.so.6(+0x36160)[0x7fccd622d160]</div><div>/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]</div><div>/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]</div><div>/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]</div><div>/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]</div><div>/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]</div><div>/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]</div><div>/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]</div><div>/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]</div><div>/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]</div><div>/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]</div><div>/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]</div><div>/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]</div><div>/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]</div><div>/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]</div><div>---------</div><div><br></div><div>Do the pending patches fix the crash or only the repeated warnings? I'm running glusterfs on OpenSUSE 15.0 installed via <a href="http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/" rel="noreferrer" target="_blank">http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/</a>, not too sure how to make it core dump.</div><div><br></div><div>If it's not fixed by the patches above, has anyone already opened a ticket for the crashes that I can join and monitor? This is going to create a massive problem for us since production systems are crashing.</div><div><br></div><div>Thanks.</div><div><div dir="ltr" class="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail-m_4728589735375555382gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" rel="noreferrer" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" rel="noreferrer" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" rel="noreferrer" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <<a href="mailto:rgowdapp@redhat.com" rel="noreferrer" target="_blank">rgowdapp@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <<a href="mailto:archon810@gmail.com" rel="noreferrer" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Also, not sure if related or not, but I got a ton of these "Failed to dispatch handler" in my logs as well. Many people have been commenting about this issue here <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1651246" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1651246</a>.</div></div></div></div></blockquote><div><br></div><div><a href="https://review.gluster.org/#/c/glusterfs/+/22046/" rel="noreferrer" target="_blank">https://review.gluster.org/#/c/glusterfs/+/22046/</a> addresses this.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">==> mnt-SITE_data1.log <==<br>[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]<br>==> mnt-SITE_data3.log <==<br>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 413 times between [2019-01-30 20:36:23.881090] and [2019-01-30 20:38:20.015593]<br>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0" repeated 42 times between [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]<br>==> mnt-SITE_data1.log <==<br>The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0" repeated 50 times between [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]<br>The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and [2019-01-30 20:38:20.546355]<br>[2019-01-30 20:38:21.492319] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0<br>==> mnt-SITE_data3.log <==<br>[2019-01-30 20:38:22.349689] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0<br>==> mnt-SITE_data1.log <==<br>[2019-01-30 20:38:22.762941] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler </blockquote><div dir="ltr"><div><div dir="ltr" class="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail-m_4728589735375555382gmail-m_-832810018525896981gmail-m_-8072525330423685591gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><div dir="ltr">I'm hoping raising the issue here on the mailing list may bring some additional eyeballs and get them both fixed.<br></div><div><br></div><div>Thanks.</div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" rel="noreferrer" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" rel="noreferrer" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" rel="noreferrer" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div><div dir="ltr"><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <<a href="mailto:archon810@gmail.com" rel="noreferrer" target="_blank">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>I found a similar issue here: <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1313567" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1313567</a>. There's a comment from 3 days ago from someone else with 5.3 who started seeing the spam.</div><div><br></div><div>Here's the command that repeats over and over:</div><div>[2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]</div></div></div></div></blockquote></div></blockquote><div><br></div><div><a class="gmail_plusreply" id="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail-m_4728589735375555382gmail-m_-832810018525896981plusReplyChip-0" href="mailto:mchangir@redhat.com" rel="noreferrer" target="_blank">+Milind Changire</a> Can you check why this message is logged and send a fix?</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><br></div><div>Is there any fix for this issue?</div><div><br></div><div>Thanks.</div><div><div dir="ltr" class="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail-m_4728589735375555382gmail-m_-832810018525896981gmail-m_-8072525330423685591gmail-m_3626885015457760579gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" rel="noreferrer" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" rel="noreferrer" target="_blank">beerpla.net</a> | <a href="https://plus.google.com/+ArtemRussakovskii" rel="noreferrer" target="_blank">+ArtemRussakovskii</a> | <a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div></div></div></div>
</blockquote></div>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" rel="noreferrer" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div></div></div>
</blockquote></div>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" rel="noreferrer" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_5289700808037795224gmail-m_7079714630512715500gmail-m_6462230140181828570gmail-m_6778230785632197751gmail-m_-3193886162322266115gmail-m_-1563755446182869324gmail-m_2040695442618143403gmail-m_-420917109542894198m_1449968617815858209gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Amar Tumballi (amarts)<br></div></div></div></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>