<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 18, 2017 at 2:01 PM, Niels de Vos <span dir="ltr">&lt;<a href="mailto:ndevos@redhat.com" target="_blank">ndevos@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Fri, Aug 18, 2017 at 12:22:33PM +0530, Atin Mukherjee wrote:<br>
&gt; You&#39;re hitting a race here. By the time glusterd tries to resolve the<br>
&gt; address of one of the remote bricks of a particular volume, the n/w<br>
&gt; interface is not up by that time. We have fixed this issue in mainline and<br>
&gt; 3.12 branch through the following commit:<br>
<br>
</span>We still maintain 3.10 for at least 6 months. It probably makes sense to<br>
backport this? I would not bother with 3.8 though, the last update for<br>
this version has already been shipped.<br></blockquote><div><br></div><div>Agreed. Gaurav is backporting the fix in 3.10 now. <br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Thanks,<br>
Niels<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
&gt;<br>
&gt; commit 1477fa442a733d7b1a5ea74884cac8<wbr>f29fbe7e6a<br>
&gt; Author: Gaurav Yadav &lt;<a href="mailto:gyadav@redhat.com">gyadav@redhat.com</a>&gt;<br>
&gt; Date:   Tue Jul 18 16:23:18 2017 +0530<br>
&gt;<br>
&gt;     glusterd : glusterd fails to start when  peer&#39;s network interface is<br>
&gt; down<br>
&gt;<br>
&gt;     Problem:<br>
&gt;     glusterd fails to start on nodes where glusterd tries to come up even<br>
&gt;     before network is up.<br>
&gt;<br>
&gt;     Fix:<br>
&gt;     On startup glusterd tries to resolve brick path which is based on<br>
&gt;     hostname/ip, but in the above scenario when network interface is not<br>
&gt;     up, glusterd is not able to resolve the brick path using ip_address or<br>
&gt;     hostname With this fix glusterd will use UUID to resolve brick path.<br>
&gt;<br>
&gt;     Change-Id: Icfa7b2652417135530479d0aa4e2a<wbr>82b0476f710<br>
&gt;     BUG: 1472267<br>
&gt;     Signed-off-by: Gaurav Yadav &lt;<a href="mailto:gyadav@redhat.com">gyadav@redhat.com</a>&gt;<br>
&gt;     Reviewed-on: <a href="https://review.gluster.org/17813" rel="noreferrer" target="_blank">https://review.gluster.org/<wbr>17813</a><br>
&gt;     Smoke: Gluster Build System &lt;<a href="mailto:jenkins@build.gluster.org">jenkins@build.gluster.org</a>&gt;<br>
&gt;     Reviewed-by: Prashanth Pai &lt;<a href="mailto:ppai@redhat.com">ppai@redhat.com</a>&gt;<br>
&gt;     CentOS-regression: Gluster Build System &lt;<a href="mailto:jenkins@build.gluster.org">jenkins@build.gluster.org</a>&gt;<br>
&gt;     Reviewed-by: Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; Note : 3.12 release is planned by end of this month.<br>
&gt;<br>
&gt; ~Atin<br>
&gt;<br>
&gt; On Thu, Aug 17, 2017 at 2:45 PM, ismael mondiu &lt;<a href="mailto:mondiu@hotmail.com">mondiu@hotmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; &gt; Hi Team,<br>
&gt; &gt;<br>
&gt; &gt; I noticed that glusterd is never starting when i reboot my Redhat 7.1<br>
&gt; &gt; server.<br>
&gt; &gt;<br>
&gt; &gt; Service is enabled but don&#39;t works.<br>
&gt; &gt;<br>
&gt; &gt; I tested with gluster 3.10.4 &amp; gluster 3.10.5 and the problem still exists.<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; When i started the service manually this works.<br>
&gt; &gt;<br>
&gt; &gt; I&#39;va also tested on Redhat 6.6 server and gluster 3.10.4 and this works<br>
&gt; &gt; fine.<br>
&gt; &gt;<br>
&gt; &gt; The problem seems to be related to Redhat 7.1<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; This is à known issue ? if yes , can you tell me what&#39;s is the workaround?<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; Thanks<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; Some logs here<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; [root@~]# systemctl status  glusterd<br>
&gt; &gt; ● glusterd.service - GlusterFS, a clustered file-system server<br>
&gt; &gt;    Loaded: loaded (/usr/lib/systemd/system/<wbr>glusterd.service; enabled;<br>
&gt; &gt; vendor preset: disabled)<br>
&gt; &gt;    Active: failed (Result: exit-code) since Thu 2017-08-17 11:04:00 CEST;<br>
&gt; &gt; 2min 9s ago<br>
&gt; &gt;   Process: 851 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid<br>
&gt; &gt; --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=1/FAILURE)<br>
&gt; &gt;<br>
&gt; &gt; Aug 17 11:03:59 dvihcasc0r systemd[1]: Starting GlusterFS, a clustered<br>
&gt; &gt; file-system server...<br>
&gt; &gt; Aug 17 11:04:00 dvihcasc0r systemd[1]: glusterd.service: control process<br>
&gt; &gt; exited, code=exited status=1<br>
&gt; &gt; Aug 17 11:04:00 dvihcasc0r systemd[1]: Failed to start GlusterFS, a<br>
&gt; &gt; clustered file-system server.<br>
&gt; &gt; Aug 17 11:04:00 dvihcasc0r systemd[1]: Unit glusterd.service entered<br>
&gt; &gt; failed state.<br>
&gt; &gt; Aug 17 11:04:00 dvihcasc0r systemd[1]: glusterd.service failed.<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; ******************************<wbr>******************************<br>
&gt; &gt; ****************************<br>
&gt; &gt;<br>
&gt; &gt;  /var/log/glusterfs/glusterd.<wbr>log<br>
&gt; &gt;<br>
&gt; &gt; ******************************<wbr>******************************<br>
&gt; &gt; ******************************<wbr>**<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; 2017-08-17 09:04:00.202529] I [MSGID: 106478] [glusterd.c:1449:init]<br>
&gt; &gt; 0-management: Maximum allowed open file descriptors set to 65536<br>
&gt; &gt; [2017-08-17 09:04:00.202573] I [MSGID: 106479] [glusterd.c:1496:init]<br>
&gt; &gt; 0-management: Using /var/lib/glusterd as working directory<br>
&gt; &gt; [2017-08-17 09:04:00.365134] E [rpc-transport.c:283:rpc_<wbr>transport_load]<br>
&gt; &gt; 0-rpc-transport: /usr/lib64/glusterfs/3.10.5/<wbr>rpc-transport/rdma.so:<br>
&gt; &gt; cannot open shared object file: No such file or directory<br>
&gt; &gt; [2017-08-17 09:04:00.365161] W [rpc-transport.c:287:rpc_<wbr>transport_load]<br>
&gt; &gt; 0-rpc-transport: volume &#39;rdma.management&#39;: transport-type &#39;rdma&#39; is not<br>
&gt; &gt; valid or not found on this machine<br>
&gt; &gt; [2017-08-17 09:04:00.365195] W [rpcsvc.c:1661:rpcsvc_create_<wbr>listener]<br>
&gt; &gt; 0-rpc-service: cannot create listener, initing the transport failed<br>
&gt; &gt; [2017-08-17 09:04:00.365206] E [MSGID: 106243] [glusterd.c:1720:init]<br>
&gt; &gt; 0-management: creation of 1 listeners failed, continuing with succeeded<br>
&gt; &gt; transport<br>
&gt; &gt; [2017-08-17 09:04:00.464314] I [MSGID: 106228] [glusterd.c:500:glusterd_<wbr>check_gsync_present]<br>
&gt; &gt; 0-glusterd: geo-replication module not installed in the system [No such<br>
&gt; &gt; file or directory]<br>
&gt; &gt; [2017-08-17 09:04:00.510412] I [MSGID: 106513] [glusterd-store.c:2197:<wbr>glusterd_restore_op_version]<br>
&gt; &gt; 0-glusterd: retrieved op-version: 31004<br>
&gt; &gt; [2017-08-17 09:04:00.711413] I [MSGID: 106194] [glusterd-store.c:3776:<br>
&gt; &gt; glusterd_store_retrieve_<wbr>missed_snaps_list] 0-management: No missed snaps<br>
&gt; &gt; list.<br>
&gt; &gt; [2017-08-17 09:04:00.756731] E [MSGID: 106187] [glusterd-store.c:4559:<wbr>glusterd_resolve_all_bricks]<br>
&gt; &gt; 0-glusterd: resolve brick failed in restore<br>
&gt; &gt; [2017-08-17 09:04:00.756787] E [MSGID: 101019] [xlator.c:503:xlator_init]<br>
&gt; &gt; 0-management: Initialization of volume &#39;management&#39; failed, review your<br>
&gt; &gt; volfile again<br>
&gt; &gt; [2017-08-17 09:04:00.756802] E [MSGID: 101066]<br>
&gt; &gt; [graph.c:325:glusterfs_graph_<wbr>init] 0-management: initializing translator<br>
&gt; &gt; failed<br>
&gt; &gt; [2017-08-17 09:04:00.756816] E [MSGID: 101176]<br>
&gt; &gt; [graph.c:681:glusterfs_graph_<wbr>activate] 0-graph: init failed<br>
&gt; &gt; [2017-08-17 09:04:00.766584] W [glusterfsd.c:1332:cleanup_<wbr>and_exit]<br>
&gt; &gt; (--&gt;/usr/sbin/glusterd(<wbr>glusterfs_volumes_init+0xfd) [0x7f9bdef4cabd]<br>
&gt; &gt; --&gt;/usr/sbin/glusterd(<wbr>glusterfs_process_volfp+0x1b1) [0x7f9bdef4c961]<br>
&gt; &gt; --&gt;/usr/sbin/glusterd(cleanup_<wbr>and_exit+0x6b) [0x7f9bdef4be4b] ) 0-:<br>
&gt; &gt; received signum (1), shutting down<br>
&gt; &gt;<br>
&gt; &gt; ******************************<wbr>******************************<br>
&gt; &gt; ******************************<br>
&gt; &gt;<br>
&gt; &gt; [root@~]# uptime<br>
&gt; &gt;  11:13:55 up 10 min,  1 user,  load average: 0.00, 0.02, 0.04<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; ******************************<wbr>******************************<br>
&gt; &gt; ******************************<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; ______________________________<wbr>_________________<br>
&gt; &gt; Gluster-users mailing list<br>
&gt; &gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
&gt; &gt; <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
&gt; &gt;<br>
<br>
&gt; ______________________________<wbr>_________________<br>
&gt; Gluster-users mailing list<br>
&gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
&gt; <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
<br>
</div></div></blockquote></div><br></div></div>