[Bugs] [Bug 1769216] New: glusterfsd fail to get online after reboot two storage node at the same time
bugzilla at redhat.com
bugzilla at redhat.com
Wed Nov 6 07:59:31 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1769216
Bug ID: 1769216
Summary: glusterfsd fail to get online after reboot two storage
node at the same time
Product: GlusterFS
Version: 7
Hardware: x86_64
OS: Linux
Status: NEW
Component: glusterd
Severity: urgent
Assignee: bugs at gluster.org
Reporter: zz.sh.cynthia at gmail.com
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Created attachment 1633224
--> https://bugzilla.redhat.com/attachment.cgi?id=1633224&action=edit
glusterfsd process log
Description of problem:
During my recent test on glusterfs7, still found in case of reboot storage
nodes, often, after glusterd and glusterfsd get up, the volume status is wrong!
Glusterd and glusterfsd process are both alive however gluster v status command
showd glusterfsd process N/A
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.reboot all storage node at the same time
2.wait for all nodes getup
3.execute "gluster v status all"
Actual results:
some volume glusterfsd fail to get online
Expected results:
all glsuterfsd get online
Additional info:
# gluster v status ccs
Status of volume: ccs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick mn-0.local:/mnt/bricks/ccs/brick N/A N/A N N/A
Brick mn-1.local:/mnt/bricks/ccs/brick 53952 0 Y 2065
Brick dbm-0.local:/mnt/bricks/ccs/brick N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 4940
Self-heal Daemon on dbm-0.local N/A N/A N N/A
Self-heal Daemon on mn-1.local N/A N/A Y 2537
Task Status of Volume ccs
------------------------------------------------------------------------------
There are no active volume tasks
# ps -ef | grep glusterfsd| grep ccs
root 1764 1 0 09:10 ? 00:00:07 /usr/sbin/glusterfsd -s
mn-0.local --volfile-id ccs.mn-0.local.mnt-bricks-ccs-brick -p
/var/run/gluster/vols/ccs/mn-0.local-mnt-bricks-ccs-brick.pid -S
/var/run/gluster/7ea87ceb0a781684.socket --brick-name /mnt/bricks/ccs/brick -l
/var/log/glusterfs/bricks/mnt-bricks-ccs-brick.log --log-level TRACE
--xlator-option *-posix.glusterd-uuid=ebaded6d-91d5-4873-a60a-59bbcc813714
--process-name brick --brick-port 53952 --xlator-option
ccs-server.listen-port=53952 --xlator-option
transport.socket.bind-address=mn-0.local
[root at mn-0:/var/log/storageinfo/symptom_log]
[root at mn-0:/var/log/storageinfo/symptom_log]
# netstat -anlp| grep 1764
tcp 0 0 192.168.1.6:53952 0.0.0.0:* LISTEN
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.11:49058 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.6:49069 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.33:49139 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.12:49136 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.16:49139 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.23:49145 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.5:49052 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.8:49113 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.7:49104 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.6:49056 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.6:49082 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.29:49144 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.5:49045 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:53952 192.168.1.11:49100 ESTABLISHED
1764/glusterfsd
tcp 0 0 192.168.1.6:49149 192.168.1.6:24007 ESTABLISHED
1764/glusterfsd
unix 2 [ ACC ] STREAM LISTENING 25405 1764/glusterfsd
/var/run/gluster/7ea87ceb0a781684.socket
unix 2 [ ACC ] STREAM LISTENING 40159 1764/glusterfsd
/var/run/gluster/changelog-25ddbf533d927939.sock
unix 3 [ ] STREAM CONNECTED 41282 1764/glusterfsd
/var/run/gluster/7ea87ceb0a781684.socket
unix 2 [ ] DGRAM 26910 1764/glusterfsd
[root at mn-0:/var/log/storageinfo/symptom_log]
[root at mn-0:/var/log/storageinfo/symptom_log]
# gluster v info ccs
Volume Name: ccs
Type: Replicate
Volume ID: 521261bc-2cba-4e7b-a21a-8486712d7a31
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: mn-0.local:/mnt/bricks/ccs/brick
Brick2: mn-1.local:/mnt/bricks/ccs/brick
Brick3: dbm-0.local:/mnt/bricks/ccs/brick
Options Reconfigured:
diagnostics.brick-log-level: TRACE
cluster.self-heal-daemon: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
cluster.server-quorum-type: none
cluster.quorum-type: auto
cluster.quorum-reads: true
cluster.consistent-metadata: on
server.allow-insecure: on
network.ping-timeout: 42
cluster.favorite-child-policy: mtime
cluster.heal-timeout: 60
performance.client-io-threads: off
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.entry-self-heal: on
cluster.server-quorum-ratio: 51%
[some analysis based on enclosed log]
>From glusterd.log
[2019-11-06 07:10:42.708849] D [MSGID: 0]
[glusterd-utils.c:6625:glusterd_restart_bricks] 0-management: starting the
volume ccs --------- glusterd start glusterfsd process here
…
[2019-11-06 07:10:43.710937] T [socket.c:226:socket_dump_info] 0-management:
$$$ client: connecting to (af:1,sock:12)
/var/run/gluster/7ea87ceb0a781684.socket non-SSL (errno:0:Success) -- does
this mean connection with glusterfsd is successful ?
>From glusterfsd.log
[2019-11-06 07:10:42.779208] T [socket.c:226:socket_dump_info]
0-socket.glusterfsd: $$$ client: listening on (af:1,sock:7)
/var/run/gluster/7ea87ceb0a781684.socket non-SSL (errno:0:Success) ------I
think this means glusterfsd unix domain socket is ready to receive
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list