<div dir="ltr"><span style="font-family:monospace,monospace">It looks like this is to do with the stale port issue.<br></span><div><span style="font-family:monospace,monospace"><br>I think it&#39;s pretty clear from the below that the digitalcorpora brick process is shown by volume status as having the same TCP port as the public volume brick on gluster-2, </span><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace">49156</span>. But is actually listening on </span><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace">49154.  So although the brick process is technically up nothing is talking to it.  I am surprised I don&#39;t see more errors in the brick log for brick8/public.  It also explains the wack-a-mole problem,  Every time I kill and restart the daemon it must be grabbing the port of another brick and then that volume brick  goes silent.  <br></span></span><div><br><span style="font-family:monospace,monospace"></span></div><div><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace">I killed all the brick processes and restarted glusterd and everything came up ok. <br></span></span></span></div><div><span style="font-family:monospace,monospace"><br></span></div><div><span style="font-family:monospace,monospace"><br></span></div><div><span style="font-family:monospace,monospace">[root@gluster-2 ~]# glv status digitalcorpora | grep -v ^Self<br>Status of volume: digitalcorpora<br>Gluster process                             TCP Port  RDMA Port  Online  Pid</span><br><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace"></span>------------------------------------------------------------------------------<br>Brick gluster-2:/export/brick7/digitalcorpo<br>ra                                          49156     0          Y       125708<br>Brick gluster1.vsnet.gmu.edu:/export/brick7<br>/digitalcorpora                             49152     0          Y       12345<br>Brick gluster0:/export/brick7/digitalcorpor<br>a                                           49152     0          Y       16098<br> <br>Task Status of Volume digitalcorpora<br>------------------------------------------------------------------------------<br>There are no active volume tasks<br> <br>[root@gluster-2 ~]# glv status public  | grep -v ^Self<br>Status of volume: public<br>Gluster process                             TCP Port  RDMA Port  Online  Pid<br>------------------------------------------------------------------------------<br>Brick gluster1:/export/brick8/public        49156     0          Y       3519 <br>Brick gluster2:/export/brick8/public        49156     0          Y       8578 <br>Brick gluster0:/export/brick8/public        49156     0          Y       3176 <br> <br>Task Status of Volume public<br>------------------------------------------------------------------------------<br>There are no active volume tasks<br> <br>[root@gluster-2 ~]# netstat -pant | grep 8578 | grep 0.0.0.0<br>tcp        0      0 <a href="http://0.0.0.0:49156">0.0.0.0:49156</a>           0.0.0.0:*               LISTEN      8578/glusterfsd     <br>[root@gluster-2 ~]# netstat -pant | grep 125708 | grep 0.0.0.0<br>tcp        0      0 <a href="http://0.0.0.0:49154">0.0.0.0:49154</a>           0.0.0.0:*               LISTEN      125708/glusterfsd   <br>[root@gluster-2 ~]# ps -c  --pid  125708 8578<br>   PID CLS PRI TTY      STAT   TIME COMMAND<br>  8578 TS   19 ?        Ssl  224:20 /usr/sbin/glusterfsd -s gluster2 --volfile-id public.gluster2.export-brick8-public -p /var/lib/glusterd/vols/public/run/gluster2-export-bric<br>125708 TS   19 ?        Ssl    0:08 /usr/sbin/glusterfsd -s gluster-2 --volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p /var/lib/glusterd/vols/digitalcorpor<br>[root@gluster-2 ~]# </span><br><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 24 October 2017 at 13:56, Atin Mukherjee <span dir="ltr">&lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neil <span dir="ltr">&lt;<a href="mailto:ajneil.tech@gmail.com" target="_blank">ajneil.tech@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div>gluster version 3.10.6, replica 3 volume, daemon is present but does not appear to be functioning <br></div><br></div>peculiar behaviour.  If I kill the glusterfs brick daemon and restart glusterd then the brick becomes available - but one of my other volumes bricks on the same server goes down in the same way it&#39;s like wack-a-mole.<br><br><div>any ideas?</div></div></blockquote><div><br></div></span><div>The subject and the data looks to be contradictory to me. Brick log (what you shared) doesn&#39;t have a cleanup_and_exit () trigger for a shutdown. Are you sure brick is down? OTOH, I see a mismatch of port for brick7/digitalcorpora where the brick process has 49154 but gluster volume status shows 49152. There is an issue with stale port which we&#39;re trying to address through <a href="https://review.gluster.org/18541" target="_blank">https://review.gluster.org/<wbr>18541</a> . But could you specify what exactly the problem is? Is it the stale port  or the conflict between volume status output and actual brick health? If it&#39;s the latter, I&#39;d need further information like output of &quot;gluster get-state&quot; command from the same node.</div><div><font size="1"><br></font></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="h5"><div dir="ltr"><div><br></div><div><br>[root@gluster-2 bricks]# glv status digitalcorpora<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font size="1">Status of volume: digitalcorpora<br>Gluster process                       <wbr>      TCP Port  RDMA Port  Online  Pid<br>------------------------------<wbr>------------------------------<wbr>------------------<br>Brick gluster-2:/export/brick7/digit<wbr>alcorpo<br>ra                            <wbr>              49156     0          Y       125708<br>Brick gluster1.vsnet.gmu.edu:/export<wbr>/brick7<br>/digitalcorpora               <wbr>              49152     0          Y       12345<br>Brick gluster0:/export/brick7/digita<wbr>lcorpor<br>a                             <wbr>              49152     0          Y       16098<br>Self-heal Daemon on localhost               N/A       N/A        Y       126625<br>Self-heal Daemon on gluster1                N/A       N/A        Y       15405<br>Self-heal Daemon on gluster0                N/A       N/A        Y       18584<br> <br>Task Status of Volume digitalcorpora<br>------------------------------<wbr>------------------------------<wbr>------------------<br>There are no active volume tasks<br> <br>[root@gluster-2 bricks]# glv heal digitalcorpora info<br>Brick gluster-2:/export/brick7/digit<wbr>alcorpora<br>Status: Transport endpoint is not connected<br>Number of entries: -<br><br>Brick gluster1.vsnet.gmu.edu:/export<wbr>/brick7/digitalcorpora<br>/.trashcan <br>/DigitalCorpora/hello2.txt <br>/DigitalCorpora <br>Status: Connected<br>Number of entries: 3<br><br>Brick gluster0:/export/brick7/digita<wbr>lcorpora<br>/.trashcan <br>/DigitalCorpora/hello2.txt <br>/DigitalCorpora <br>Status: Connected<br>Number of entries: 3<br><br>[2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and<wbr>_exit] (--&gt;/lib64/libpthread.so.0(+0x<wbr>7e25) [0x7f6f83c9de25] --&gt;/usr/sbin/glusterfsd(gluste<wbr>rfs_sigwaiter+0xe5) [0x55a148eeb135] --&gt;/usr/sbin/glusterfsd(cleanu<wbr>p_and_exit+0x6b) [0x55a148eeaf5b] ) 0-: received signum (15), shutting down<br>[2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6 (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id digitalcorpora.gluster-2.expor<wbr>t-brick7-digitalcorpora -p /var/lib/glusterd/vols/digital<wbr>corpora/run/gluster-2-export-<wbr>brick7-digitalcorpora.pid -S /var/run/gluster/f8e0b3393e47d<wbr>c51a07c6609f9b40841.socket --brick-name /export/brick7/digitalcorpora -l /var/log/glusterfs/bricks/expo<wbr>rt-brick7-digitalcorpora.log --xlator-option *-posix.glusterd-uuid=032c17f5<wbr>-8cc9-445f-aa45-897b5a066b43 --brick-port 49154 --xlator-option digitalcorpora-server.listen-p<wbr>ort=49154)<br>[2017-10-24 17:18:59.285279] I [MSGID: 101190] [event-epoll.c:629:event_dispa<wbr>tch_epoll_worker] 0-epoll: Started thread with index 1<br>[2017-10-24 17:19:04.611723] I [rpcsvc.c:2237:rpcsvc_set_outs<wbr>tanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64<br>[2017-10-24 17:19:04.611815] W [MSGID: 101002] [options.c:954:xl_opt_validate<wbr>] 0-digitalcorpora-server: option &#39;listen-port&#39; is deprecated, preferred is &#39;transport.socket.listen-port&#39;<wbr>, continuing with correction<br>[2017-10-24 17:19:04.615974] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-digitalcorpora-server: option &#39;rpc-auth.auth-glusterfs&#39; is not recognized<br>[2017-10-24 17:19:04.616033] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-digitalcorpora-server: option &#39;rpc-auth.auth-unix&#39; is not recognized<br>[2017-10-24 17:19:04.616070] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-digitalcorpora-server: option &#39;rpc-auth.auth-null&#39; is not recognized<br>[2017-10-24 17:19:04.616134] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-digitalcorpora-server: option &#39;auth-path&#39; is not recognized<br>[2017-10-24 17:19:04.616177] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-digitalcorpora-server: option &#39;ping-timeout&#39; is not recognized<br>[2017-10-24 17:19:04.616203] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-/export/brick7/digitalcorpor<wbr>a: option &#39;rpc-auth-allow-insecure&#39; is not recognized<br>[2017-10-24 17:19:04.616215] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-/export/brick7/digitalcorpor<wbr>a: option &#39;auth.addr./export/brick7/digi<wbr>talcorpora.allow&#39; is not recognized<br>[2017-10-24 17:19:04.616226] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-/export/brick7/digitalcorpor<wbr>a: option &#39;auth-path&#39; is not recognized<br>[2017-10-24 17:19:04.616237] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-/export/brick7/digitalcorpor<wbr>a: option &#39;auth.login.b17f2513-7d9c-4174<wbr>-a0c5-de4a752d46ca.password&#39; is not recognized<br>[2017-10-24 17:19:04.616248] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-/export/brick7/digitalcorpor<wbr>a: option &#39;auth.login./export/brick7/dig<wbr>italcorpora.allow&#39; is not recognized<br>[2017-10-24 17:19:04.616283] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-digitalcorpora-quota: option &#39;timeout&#39; is not recognized<br>[2017-10-24 17:19:04.616367] W [MSGID: 101174] [graph.c:361:_log_if_unknown_o<wbr>ption] 0-digitalcorpora-trash: option &#39;brick-path&#39; is not recognized<br>Final graph:<br>+-----------------------------<wbr>------------------------------<wbr>-------------------+<br>  1: volume digitalcorpora-posix<br>  2:     type storage/posix<br>  3:     option glusterd-uuid 032c17f5-8cc9-445f-aa45-897b5a<wbr>066b43<br>  4:     option directory /export/brick7/digitalcorpora<br>  5:     option volume-id 61efe58a-ae5b-4d8b-b9f9-678298<wbr>67c442<br>  6:     option brick-uid 36<br>  7:     option brick-gid 36<br>  8: end-volume<br>  9:  <br> 10: volume digitalcorpora-trash<br> 11:     type features/trash<br> 12:     option trash-dir .trashcan<br> 13:     option brick-path /export/brick7/digitalcorpora<br> 14:     option trash-internal-op off<br> 15:     subvolumes digitalcorpora-posix<br> 16: end-volume<br> 17:  <br> 18: volume digitalcorpora-changetimerecor<wbr>der<br> 19:     type features/changetimerecorder<br> 20:     option db-type sqlite3<br> 21:     option hot-brick off<br> 22:     option db-name digitalcorpora.db<br> 23:     option db-path /export/brick7/digitalcorpora/<wbr>.glusterfs/<br> 24:     option record-exit off<br> 25:     option ctr_link_consistency off<br> 26:     option ctr_lookupheal_link_timeout 300<br> 27:     option ctr_lookupheal_inode_timeout 300<br> 28:     option record-entry on<br> 29:     option ctr-enabled off<br> 30:     option record-counters off<br> 31:     option ctr-record-metadata-heat off<br> 32:     option sql-db-cachesize 12500<br> 33:     option sql-db-wal-autocheckpoint 25000<br> 34:     subvolumes digitalcorpora-trash<br> 35: end-volume<br> 36:  <br> 37: volume digitalcorpora-changelog<br> 38:     type features/changelog<br> 39:     option changelog-brick /export/brick7/digitalcorpora<br> 40:     option changelog-dir /export/brick7/digitalcorpora/<wbr>.glusterfs/changelogs<br> 41:     option changelog-barrier-timeout 120<br> 42:     subvolumes digitalcorpora-changetimerecor<wbr>der<br> 43: end-volume<br> 44:  <br> 45: volume digitalcorpora-bitrot-stub<br> 46:     type features/bitrot-stub<br> 47:     option export /export/brick7/digitalcorpora<br> 48:     subvolumes digitalcorpora-changelog<br> 49: end-volume<br> 50:  <br> 51: volume digitalcorpora-access-control<br> 52:     type features/access-control<br> 53:     subvolumes digitalcorpora-bitrot-stub<br> 54: end-volume<br> 55:  <br> 56: volume digitalcorpora-locks<br> 57:     type features/locks<br> 58:     subvolumes digitalcorpora-access-control<br> 59: end-volume<br> 60:  <br> 61: volume digitalcorpora-worm<br> 62:     type features/worm<br> 63:     option worm off<br> 64:     option worm-file-level off<br> 65:     subvolumes digitalcorpora-locks<br> 66: end-volume<br> 67:  <br> 68: volume digitalcorpora-read-only<br> 69:     type features/read-only<br> 70:     option read-only off<br> 71:     subvolumes digitalcorpora-worm<br> 72: end-volume<br> 73:  <br> 74: volume digitalcorpora-leases<br> 75:     type features/leases<br> 76:     option leases off<br> 77:     subvolumes digitalcorpora-read-only<br> 78: end-volume<br> 79:  <br> 80: volume digitalcorpora-upcall<br> 81:     type features/upcall<br> 82:     option cache-invalidation off<br> 83:     subvolumes digitalcorpora-leases<br> 84: end-volume<br> 85:  <br> 86: volume digitalcorpora-io-threads<br> 87:     type performance/io-threads<br> 88:     subvolumes digitalcorpora-upcall<br> 89: end-volume<br> 90:  <br> 91: volume digitalcorpora-marker<br> 92:     type features/marker<br> 93:     option volume-uuid 61efe58a-ae5b-4d8b-b9f9-678298<wbr>67c442<br> 94:     option timestamp-file /var/lib/glusterd/vols/digital<wbr>corpora/marker.tstamp<br> 95:     option quota-version 0<br> 96:     option xtime off<br> 97:     option gsync-force-xtime off<br> 98:     option quota off<br> 99:     option inode-quota off<br>100:     subvolumes digitalcorpora-io-threads<br>101: end-volume<br>102:  <br>103: volume digitalcorpora-barrier<br>104:     type features/barrier<br>105:     option barrier disable<br>106:     option barrier-timeout 120<br>107:     subvolumes digitalcorpora-marker<br>108: end-volume<br>109:  <br>110: volume digitalcorpora-index<br>111:     type features/index<br>112:     option index-base /export/brick7/digitalcorpora/<wbr>.glusterfs/indices<br>113:     option xattrop-dirty-watchlist trusted.afr.dirty<br>114:     option xattrop-pending-watchlist trusted.afr.digitalcorpora-<br>115:     subvolumes digitalcorpora-barrier<br>116: end-volume<br>117:  <br>118: volume digitalcorpora-quota<br>119:     type features/quota<br>120:     option volume-uuid digitalcorpora<br>121:     option server-quota off<br>122:     option timeout 0<br>123:     option deem-statfs off<br>124:     subvolumes digitalcorpora-index<br>125: end-volume<br>126:  <br>127: volume digitalcorpora-io-stats<br>128:     type debug/io-stats<br>129:     option unique-id /export/brick7/digitalcorpora<br>130:     option log-level WARNING<br>131:     option latency-measurement off<br>132:     option count-fop-hits off<br>133:     subvolumes digitalcorpora-quota<br>134: end-volume<br>135:  <br>136: volume /export/brick7/digitalcorpora<br>137:     type performance/decompounder<br>138:     option rpc-auth-allow-insecure on<br>139:     option auth.addr./export/brick7/digit<wbr>alcorpora.allow 129.174.125.204,129.174.93.204<br>140:     option auth-path /export/brick7/digitalcorpora<br>141:     option auth.login.b17f2513-7d9c-4174-<wbr>a0c5-de4a752d46ca.password 6c007ad0-b5a2-4564-8464-300f83<wbr>17e5c7<br>142:     option auth.login./export/brick7/digi<wbr>talcorpora.allow b17f2513-7d9c-4174-a0c5-de4a75<wbr>2d46ca<br>143:     subvolumes digitalcorpora-io-stats<br>144: end-volume<br>145:  <br>146: volume digitalcorpora-server<br>147:     type protocol/server<br>148:     option transport.socket.listen-port 49154<br>149:     option rpc-auth.auth-glusterfs on<br>150:     option rpc-auth.auth-unix on<br>151:     option rpc-auth.auth-null on<br>152:     option transport-type tcp<br>153:     option transport.address-family inet<br>154:     option auth.login./export/brick7/digi<wbr>talcorpora.allow b17f2513-7d9c-4174-a0c5-de4a75<wbr>2d46ca<br>155:     option auth.login.b17f2513-7d9c-4174-<wbr>a0c5-de4a752d46ca.password 6c007ad0-b5a2-4564-8464-300f83<wbr>17e5c7<br>156:     option auth-path /export/brick7/digitalcorpora<br>157:     option auth.addr./export/brick7/digit<wbr>alcorpora.allow 129.174.125.204,129.174.93.204<br>158:     option ping-timeout 42<br>159:     option transport.socket.keepalive 1<br>160:     option rpc-auth-allow-insecure on<br>161:     option transport.tcp-user-timeout 0<br>162:     option transport.socket.keepalive-tim<wbr>e 20<br>163:     option transport.socket.keepalive-int<wbr>erval 2<br>164:     option transport.socket.keepalive-cou<wbr>nt 9<br>165:     subvolumes /export/brick7/digitalcorpora<br>166: end-volume<br>167:  <br>+-----------------------------<wbr>------------------------------<wbr>-------------------+<br>[2017-10-24 17:22:21.438620] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on <a href="http://129.174.126.87:24007" target="_blank">129.174.126.87:24007</a> failed (No data available)<br></font></blockquote><br></div></div>
<br></div></div>______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a><br></blockquote></div><br></div></div>
</blockquote></div><br></div>