[Gluster-users] brick is down but gluster volume status says it's fine
Alastair Neil
ajneil.tech at gmail.com
Tue Oct 24 19:32:12 UTC 2017
It looks like this is to do with the stale port issue.
I think it's pretty clear from the below that the digitalcorpora brick
process is shown by volume status as having the same TCP port as the public
volume brick on gluster-2, 49156. But is actually listening on 49154. So
although the brick process is technically up nothing is talking to it. I
am surprised I don't see more errors in the brick log for brick8/public.
It also explains the wack-a-mole problem, Every time I kill and restart
the daemon it must be grabbing the port of another brick and then that
volume brick goes silent.
I killed all the brick processes and restarted glusterd and everything came
up ok.
[root at gluster-2 ~]# glv status digitalcorpora | grep -v ^Self
Status of volume: digitalcorpora
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster-2:/export/brick7/digitalcorpo
ra 49156 0 Y
125708
Brick gluster1.vsnet.gmu.edu:/export/brick7
/digitalcorpora 49152 0 Y
12345
Brick gluster0:/export/brick7/digitalcorpor
a 49152 0 Y
16098
Task Status of Volume digitalcorpora
------------------------------------------------------------------------------
There are no active volume tasks
[root at gluster-2 ~]# glv status public | grep -v ^Self
Status of volume: public
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster1:/export/brick8/public 49156 0 Y
3519
Brick gluster2:/export/brick8/public 49156 0 Y
8578
Brick gluster0:/export/brick8/public 49156 0 Y
3176
Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
[root at gluster-2 ~]# netstat -pant | grep 8578 | grep 0.0.0.0
tcp 0 0 0.0.0.0:49156 0.0.0.0:*
LISTEN 8578/glusterfsd
[root at gluster-2 ~]# netstat -pant | grep 125708 | grep 0.0.0.0
tcp 0 0 0.0.0.0:49154 0.0.0.0:*
LISTEN 125708/glusterfsd
[root at gluster-2 ~]# ps -c --pid 125708 8578
PID CLS PRI TTY STAT TIME COMMAND
8578 TS 19 ? Ssl 224:20 /usr/sbin/glusterfsd -s gluster2
--volfile-id public.gluster2.export-brick8-public -p
/var/lib/glusterd/vols/public/run/gluster2-export-bric
125708 TS 19 ? Ssl 0:08 /usr/sbin/glusterfsd -s gluster-2
--volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p
/var/lib/glusterd/vols/digitalcorpor
[root at gluster-2 ~]#
On 24 October 2017 at 13:56, Atin Mukherjee <amukherj at redhat.com> wrote:
>
>
> On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neil <ajneil.tech at gmail.com>
> wrote:
>
>> gluster version 3.10.6, replica 3 volume, daemon is present but does not
>> appear to be functioning
>>
>> peculiar behaviour. If I kill the glusterfs brick daemon and restart
>> glusterd then the brick becomes available - but one of my other volumes
>> bricks on the same server goes down in the same way it's like wack-a-mole.
>>
>> any ideas?
>>
>
> The subject and the data looks to be contradictory to me. Brick log (what
> you shared) doesn't have a cleanup_and_exit () trigger for a shutdown. Are
> you sure brick is down? OTOH, I see a mismatch of port for
> brick7/digitalcorpora where the brick process has 49154 but gluster volume
> status shows 49152. There is an issue with stale port which we're trying to
> address through https://review.gluster.org/18541 . But could you specify
> what exactly the problem is? Is it the stale port or the conflict between
> volume status output and actual brick health? If it's the latter, I'd need
> further information like output of "gluster get-state" command from the
> same node.
>
>
>>
>> [root at gluster-2 bricks]# glv status digitalcorpora
>>
>>> Status of volume: digitalcorpora
>>> Gluster process TCP Port RDMA Port
>>> Online Pid
>>> ------------------------------------------------------------
>>> ------------------
>>> Brick gluster-2:/export/brick7/digitalcorpo
>>> ra 49156 0
>>> Y 125708
>>> Brick gluster1.vsnet.gmu.edu:/export/brick7
>>> /digitalcorpora 49152 0
>>> Y 12345
>>> Brick gluster0:/export/brick7/digitalcorpor
>>> a 49152 0
>>> Y 16098
>>> Self-heal Daemon on localhost N/A N/A Y
>>> 126625
>>> Self-heal Daemon on gluster1 N/A N/A Y
>>> 15405
>>> Self-heal Daemon on gluster0 N/A N/A Y
>>> 18584
>>>
>>> Task Status of Volume digitalcorpora
>>> ------------------------------------------------------------
>>> ------------------
>>> There are no active volume tasks
>>>
>>> [root at gluster-2 bricks]# glv heal digitalcorpora info
>>> Brick gluster-2:/export/brick7/digitalcorpora
>>> Status: Transport endpoint is not connected
>>> Number of entries: -
>>>
>>> Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora
>>> /.trashcan
>>> /DigitalCorpora/hello2.txt
>>> /DigitalCorpora
>>> Status: Connected
>>> Number of entries: 3
>>>
>>> Brick gluster0:/export/brick7/digitalcorpora
>>> /.trashcan
>>> /DigitalCorpora/hello2.txt
>>> /DigitalCorpora
>>> Status: Connected
>>> Number of entries: 3
>>>
>>> [2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit]
>>> (-->/lib64/libpthread.so.0(+0x7e25) [0x7f6f83c9de25]
>>> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55a148eeb135]
>>> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a148eeaf5b] ) 0-:
>>> received signum (15), shutting down
>>> [2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main]
>>> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6
>>> (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id
>>> digitalcorpora.gluster-2.export-brick7-digitalcorpora -p
>>> /var/lib/glusterd/vols/digitalcorpora/run/gluster-2-export-brick7-digitalcorpora.pid
>>> -S /var/run/gluster/f8e0b3393e47dc51a07c6609f9b40841.socket
>>> --brick-name /export/brick7/digitalcorpora -l /var/log/glusterfs/bricks/export-brick7-digitalcorpora.log
>>> --xlator-option *-posix.glusterd-uuid=032c17f5-8cc9-445f-aa45-897b5a066b43
>>> --brick-port 49154 --xlator-option digitalcorpora-server.listen-p
>>> ort=49154)
>>> [2017-10-24 17:18:59.285279] I [MSGID: 101190]
>>> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 1
>>> [2017-10-24 17:19:04.611723] I [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit]
>>> 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
>>> [2017-10-24 17:19:04.611815] W [MSGID: 101002]
>>> [options.c:954:xl_opt_validate] 0-digitalcorpora-server: option
>>> 'listen-port' is deprecated, preferred is 'transport.socket.listen-port',
>>> continuing with correction
>>> [2017-10-24 17:19:04.615974] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>>> 'rpc-auth.auth-glusterfs' is not recognized
>>> [2017-10-24 17:19:04.616033] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>>> 'rpc-auth.auth-unix' is not recognized
>>> [2017-10-24 17:19:04.616070] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>>> 'rpc-auth.auth-null' is not recognized
>>> [2017-10-24 17:19:04.616134] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>>> 'auth-path' is not recognized
>>> [2017-10-24 17:19:04.616177] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option
>>> 'ping-timeout' is not recognized
>>> [2017-10-24 17:19:04.616203] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
>>> option 'rpc-auth-allow-insecure' is not recognized
>>> [2017-10-24 17:19:04.616215] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
>>> option 'auth.addr./export/brick7/digitalcorpora.allow' is not recognized
>>> [2017-10-24 17:19:04.616226] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
>>> option 'auth-path' is not recognized
>>> [2017-10-24 17:19:04.616237] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
>>> option 'auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password' is
>>> not recognized
>>> [2017-10-24 17:19:04.616248] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora:
>>> option 'auth.login./export/brick7/digitalcorpora.allow' is not
>>> recognized
>>> [2017-10-24 17:19:04.616283] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-quota: option
>>> 'timeout' is not recognized
>>> [2017-10-24 17:19:04.616367] W [MSGID: 101174]
>>> [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-trash: option
>>> 'brick-path' is not recognized
>>> Final graph:
>>> +-----------------------------------------------------------
>>> -------------------+
>>> 1: volume digitalcorpora-posix
>>> 2: type storage/posix
>>> 3: option glusterd-uuid 032c17f5-8cc9-445f-aa45-897b5a066b43
>>> 4: option directory /export/brick7/digitalcorpora
>>> 5: option volume-id 61efe58a-ae5b-4d8b-b9f9-67829867c442
>>> 6: option brick-uid 36
>>> 7: option brick-gid 36
>>> 8: end-volume
>>> 9:
>>> 10: volume digitalcorpora-trash
>>> 11: type features/trash
>>> 12: option trash-dir .trashcan
>>> 13: option brick-path /export/brick7/digitalcorpora
>>> 14: option trash-internal-op off
>>> 15: subvolumes digitalcorpora-posix
>>> 16: end-volume
>>> 17:
>>> 18: volume digitalcorpora-changetimerecorder
>>> 19: type features/changetimerecorder
>>> 20: option db-type sqlite3
>>> 21: option hot-brick off
>>> 22: option db-name digitalcorpora.db
>>> 23: option db-path /export/brick7/digitalcorpora/.glusterfs/
>>> 24: option record-exit off
>>> 25: option ctr_link_consistency off
>>> 26: option ctr_lookupheal_link_timeout 300
>>> 27: option ctr_lookupheal_inode_timeout 300
>>> 28: option record-entry on
>>> 29: option ctr-enabled off
>>> 30: option record-counters off
>>> 31: option ctr-record-metadata-heat off
>>> 32: option sql-db-cachesize 12500
>>> 33: option sql-db-wal-autocheckpoint 25000
>>> 34: subvolumes digitalcorpora-trash
>>> 35: end-volume
>>> 36:
>>> 37: volume digitalcorpora-changelog
>>> 38: type features/changelog
>>> 39: option changelog-brick /export/brick7/digitalcorpora
>>> 40: option changelog-dir /export/brick7/digitalcorpora/
>>> .glusterfs/changelogs
>>> 41: option changelog-barrier-timeout 120
>>> 42: subvolumes digitalcorpora-changetimerecorder
>>> 43: end-volume
>>> 44:
>>> 45: volume digitalcorpora-bitrot-stub
>>> 46: type features/bitrot-stub
>>> 47: option export /export/brick7/digitalcorpora
>>> 48: subvolumes digitalcorpora-changelog
>>> 49: end-volume
>>> 50:
>>> 51: volume digitalcorpora-access-control
>>> 52: type features/access-control
>>> 53: subvolumes digitalcorpora-bitrot-stub
>>> 54: end-volume
>>> 55:
>>> 56: volume digitalcorpora-locks
>>> 57: type features/locks
>>> 58: subvolumes digitalcorpora-access-control
>>> 59: end-volume
>>> 60:
>>> 61: volume digitalcorpora-worm
>>> 62: type features/worm
>>> 63: option worm off
>>> 64: option worm-file-level off
>>> 65: subvolumes digitalcorpora-locks
>>> 66: end-volume
>>> 67:
>>> 68: volume digitalcorpora-read-only
>>> 69: type features/read-only
>>> 70: option read-only off
>>> 71: subvolumes digitalcorpora-worm
>>> 72: end-volume
>>> 73:
>>> 74: volume digitalcorpora-leases
>>> 75: type features/leases
>>> 76: option leases off
>>> 77: subvolumes digitalcorpora-read-only
>>> 78: end-volume
>>> 79:
>>> 80: volume digitalcorpora-upcall
>>> 81: type features/upcall
>>> 82: option cache-invalidation off
>>> 83: subvolumes digitalcorpora-leases
>>> 84: end-volume
>>> 85:
>>> 86: volume digitalcorpora-io-threads
>>> 87: type performance/io-threads
>>> 88: subvolumes digitalcorpora-upcall
>>> 89: end-volume
>>> 90:
>>> 91: volume digitalcorpora-marker
>>> 92: type features/marker
>>> 93: option volume-uuid 61efe58a-ae5b-4d8b-b9f9-67829867c442
>>> 94: option timestamp-file /var/lib/glusterd/vols/digital
>>> corpora/marker.tstamp
>>> 95: option quota-version 0
>>> 96: option xtime off
>>> 97: option gsync-force-xtime off
>>> 98: option quota off
>>> 99: option inode-quota off
>>> 100: subvolumes digitalcorpora-io-threads
>>> 101: end-volume
>>> 102:
>>> 103: volume digitalcorpora-barrier
>>> 104: type features/barrier
>>> 105: option barrier disable
>>> 106: option barrier-timeout 120
>>> 107: subvolumes digitalcorpora-marker
>>> 108: end-volume
>>> 109:
>>> 110: volume digitalcorpora-index
>>> 111: type features/index
>>> 112: option index-base /export/brick7/digitalcorpora/
>>> .glusterfs/indices
>>> 113: option xattrop-dirty-watchlist trusted.afr.dirty
>>> 114: option xattrop-pending-watchlist trusted.afr.digitalcorpora-
>>> 115: subvolumes digitalcorpora-barrier
>>> 116: end-volume
>>> 117:
>>> 118: volume digitalcorpora-quota
>>> 119: type features/quota
>>> 120: option volume-uuid digitalcorpora
>>> 121: option server-quota off
>>> 122: option timeout 0
>>> 123: option deem-statfs off
>>> 124: subvolumes digitalcorpora-index
>>> 125: end-volume
>>> 126:
>>> 127: volume digitalcorpora-io-stats
>>> 128: type debug/io-stats
>>> 129: option unique-id /export/brick7/digitalcorpora
>>> 130: option log-level WARNING
>>> 131: option latency-measurement off
>>> 132: option count-fop-hits off
>>> 133: subvolumes digitalcorpora-quota
>>> 134: end-volume
>>> 135:
>>> 136: volume /export/brick7/digitalcorpora
>>> 137: type performance/decompounder
>>> 138: option rpc-auth-allow-insecure on
>>> 139: option auth.addr./export/brick7/digitalcorpora.allow
>>> 129.174.125.204,129.174.93.204
>>> 140: option auth-path /export/brick7/digitalcorpora
>>> 141: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password
>>> 6c007ad0-b5a2-4564-8464-300f8317e5c7
>>> 142: option auth.login./export/brick7/digitalcorpora.allow
>>> b17f2513-7d9c-4174-a0c5-de4a752d46ca
>>> 143: subvolumes digitalcorpora-io-stats
>>> 144: end-volume
>>> 145:
>>> 146: volume digitalcorpora-server
>>> 147: type protocol/server
>>> 148: option transport.socket.listen-port 49154
>>> 149: option rpc-auth.auth-glusterfs on
>>> 150: option rpc-auth.auth-unix on
>>> 151: option rpc-auth.auth-null on
>>> 152: option transport-type tcp
>>> 153: option transport.address-family inet
>>> 154: option auth.login./export/brick7/digitalcorpora.allow
>>> b17f2513-7d9c-4174-a0c5-de4a752d46ca
>>> 155: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password
>>> 6c007ad0-b5a2-4564-8464-300f8317e5c7
>>> 156: option auth-path /export/brick7/digitalcorpora
>>> 157: option auth.addr./export/brick7/digitalcorpora.allow
>>> 129.174.125.204,129.174.93.204
>>> 158: option ping-timeout 42
>>> 159: option transport.socket.keepalive 1
>>> 160: option rpc-auth-allow-insecure on
>>> 161: option transport.tcp-user-timeout 0
>>> 162: option transport.socket.keepalive-time 20
>>> 163: option transport.socket.keepalive-interval 2
>>> 164: option transport.socket.keepalive-count 9
>>> 165: subvolumes /export/brick7/digitalcorpora
>>> 166: end-volume
>>> 167:
>>> +-----------------------------------------------------------
>>> -------------------+
>>> [2017-10-24 17:22:21.438620] W [socket.c:593:__socket_rwv] 0-glusterfs:
>>> readv on 129.174.126.87:24007 failed (No data available)
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171024/f9580462/attachment.html>
More information about the Gluster-users
mailing list