[Bugs] [Bug 1796609] New: Random glusterfsd crashes
bugzilla at redhat.com
bugzilla at redhat.com
Thu Jan 30 18:43:46 UTC 2020
https://bugzilla.redhat.com/show_bug.cgi?id=1796609
Bug ID: 1796609
Summary: Random glusterfsd crashes
Product: GlusterFS
Version: 7
Hardware: x86_64
OS: Linux
Status: NEW
Component: core
Severity: high
Assignee: bugs at gluster.org
Reporter: gagnon.pierluc at gmail.com
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Created attachment 1656556
--> https://bugzilla.redhat.com/attachment.cgi?id=1656556&action=edit
Crash log
Description of problem:
The gluster volume becomes inacessible (Transport endpoint not connected),
which seems to be due to glusterfsd crashing.
Version-Release number of selected component (if applicable):
7.2
How reproducible:
Happens regularly (about once a day), but I cannot figure out how to trigger
it.
Additional info:
apport dump: https://drive.google.com/open?id=1zElM6I6HNE7V_WU_SQH5-emlPpdcRd6e
mnt-{volume}.log attached (my volume is called 'gfs')
My cluster is composed of 3 nodes:
mars: 192.168.4.132
venus: 192.168.5.196
saturn: 192.168.4.146
Each node has 2 bricks, with a replica set to 3 (so 2 x 3).
All bricks are on xfs, except for the 2 bricks on mars which are on a single
zfs volume (crash does not happen only on mars)
Extra:
~> sudo gluster peer status
Number of Peers: 2
Hostname: mars
Uuid: 53e473df-d8e9-4d0d-b753-ccfff5c5097c
State: Peer in Cluster (Connected)
Hostname: venus.sarbakaninc.local
Uuid: 4aa987f2-924b-4a2c-b441-ff1b0b1cbb86
State: Peer in Cluster (Connected)
Other names:
venus.sarbakaninc.local
venus
~> sudo gluster volume info
Volume Name: gfs
Type: Distributed-Replicate
Volume ID: 3f451b61-e48b-4be4-92ed-e509271d0284
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: saturn:/gluster/bricks/1/brick
Brick2: venus:/gluster/bricks/2/brick
Brick3: mars:/gluster/bricks/3/brick
Brick4: venus:/gluster/bricks/5/brick
Brick5: saturn:/gluster/bricks/6/brick
Brick6: mars:/gluster/bricks/4/brick
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
server.event-threads: 4
changelog.changelog: on
geo-replication.ignore-pid-check: off
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
auth.allow:
192.168.5.222,192.168.5.196,192.168.4.132,192.168.4.133,192.168.5.195,192.168.4.146,192.168.5.55
performance.cache-size: 1GB
cluster.enable-shared-storage: disable
~> sudo gluster volume status
Status of volume: gfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick saturn:/gluster/bricks/1/brick 49152 0 Y 21533
Brick venus:/gluster/bricks/2/brick 49152 0 Y 4590
Brick mars:/gluster/bricks/3/brick 49154 0 Y 30419
Brick venus:/gluster/bricks/5/brick 49153 0 Y 4591
Brick saturn:/gluster/bricks/6/brick 49153 0 Y 21534
Brick mars:/gluster/bricks/4/brick 49155 0 Y 30447
Self-heal Daemon on localhost N/A N/A Y 21564
Self-heal Daemon on venus.sarbakaninc.local N/A N/A Y 4610
Self-heal Daemon on mars N/A N/A Y 3640
Task Status of Volume gfs
------------------------------------------------------------------------------
There are no active volume tasks
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list