<div dir="ltr"><div class="gmail_default"><font size="4">The crashes seem to have stopped after I downgraded the one machine to match the others.</font></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 10, 2017 at 11:50 AM, Sergei Gerasenko <span dir="ltr"><<a href="mailto:gerases@gmail.com" target="_blank">gerases@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-size:small">I see why it's not saving the cores: the package isn't signed with the right signature. I will modify the abrd configs to change that behavior and wait for the next crash.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 10, 2017 at 11:23 AM, Vijay Bellur <span dir="ltr"><<a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 10, 2017 at 11:17 AM, Sergei Gerasenko <span dir="ltr"><<a href="mailto:gerases@gmail.com" target="_blank">gerases@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><font size="4">Hi, </font></div><div><font size="4"><br></font></div><div><font size="4">I'm running gluster 3.7.12. It's an 8-node distributed, replicated cluster (replica 2). It's had been working fine for a long time when all of a sudden I started seeing bricks going offline. Researching further I found messages like this:</font></div><div style="font-size:small"><font face="monospace, monospace"><br></font></div><div><div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: pending frames:</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: frame : type(0) op(5)</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: patchset: git://<a href="http://git.gluster.com/glusterfs.git" target="_blank">git.gluster.com/glusterf<wbr>s.git</a></font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: signal received: 6</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: time of crash:</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: 2017-03-10 05:02:12</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: configuration details:</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: argp 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: backtrace 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: dlfcn 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: libpthread 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: llistxattr 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: setfsid 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: spinlock 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: epoll.h 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: xattr.h 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: st_atim.tv_nsec 1</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: package-string: glusterfs 3.7.12</font></div><div style="font-size:small"><font face="monospace, monospace">Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: ---------</font></div><div><span style="font-family:arial,helvetica,sans-serif;font-size:large"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif;font-size:large">I initially thought it was related to quota support (based on some googling), so I turned off quota and also disabled NFS support to simplify the debugging. Every time after the crash, I restarted gluster and the bricks would go online for several hours only to crash again later. There are lots of messages like this preceding the crash:</span><br></div><div style="font-size:small"><font face="monospace, monospace"><br></font></div><div style="font-size:small"><font face="monospace, monospace">...</font></div><div style="font-size:small"><font face="monospace, monospace">[2017-03-10 04:40:46.002225] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null)<br></font></div></div><div><div><font face="monospace, monospace">[2017-03-10 04:40:46.002278] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed [Invalid argument]</font></div><div><font face="monospace, monospace">The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between [2017-03-10 04:40:46.002225] and [2017-03-10 04:40:46.005699]</font></div><div><font face="monospace, monospace">The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3 times between [2017-03-10 04:40:46.002278] and [2017-03-10 04:40:46.005701]</font></div><div><font face="monospace, monospace">[2017-03-10 04:50:47.002170] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null)</font></div><div><font face="monospace, monospace">[2017-03-10 04:50:47.002219] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed [Invalid argument]</font></div><div><font face="monospace, monospace">The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between [2017-03-10 04:50:47.002170] and [2017-03-10 04:50:47.005623]</font></div><div><font face="monospace, monospace">The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3 times between [2017-03-10 04:50:47.002219] and [2017-03-10 04:50:47.005625]</font></div><div><font face="monospace, monospace">[2017-03-10 05:00:48.002246] E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null)</font></div><div><font face="monospace, monospace">[2017-03-10 05:00:48.002314] E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed [Invalid argument]</font></div><div><font face="monospace, monospace">The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between [2017-03-10 05:00:48.002246] and [2017-03-10 05:00:48.005828]</font></div><div><font face="monospace, monospace">The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3 times between [2017-03-10 05:00:48.002314] and [2017-03-10 05:00:48.005830]</font></div></div><div><font face="monospace, monospace"><br></font></div><div><font face="arial, helvetica, sans-serif" size="4">One important detail I noticed yesterday is that one of the nodes was running gluster version 3.7.13! I'm not sure what did the upgrade. So I downgraded to 3.7.12 and restarted gluster. The crash above happened several hours later. But again, the crashes had been happening before the downgrade -- possibly because of the version mismatch on one of the nodes.</font></div><div><font face="arial, helvetica, sans-serif" size="4"><br></font></div><div><font face="arial, helvetica, sans-serif" size="4">Anybody have any ideas?</font></div><div><br></div></div></div></blockquote><div><br></div><div><br></div><div>Do you have the core files from the crashes? If so, can you please provide a gdb backtrace from one of the core files?</div><div><br></div><div>Thanks,</div><div>Vijay </div></div></div></div>
</blockquote></div><br></div>
</blockquote></div><br></div>