[Gluster-users] Gluster nfs crash

Mahdi Adnan mahdi.adnan at earthlinktele.com
Sat Jul 23 08:05:22 UTC 2016


Hi, i have im having issues with gluster nfs, it keep crashing after few
hours under medium load.

OS: CentOS 7.2
Gluster version 3.7.13

Gluster info;
Volume Name: vlm01
Type: Distributed-Replicate
Volume ID: eacd8248-dca3-4530-9aed-7714a5a114f2
Status: Started
Number of Bricks: 7 x 3 = 21
Transport-type: tcp
Bricks:
Brick1: gfs01:/bricks/b01/vlm01
Brick2: gfs02:/bricks/b01/vlm01
Brick3: gfs03:/bricks/b01/vlm01
Brick4: gfs01:/bricks/b02/vlm01
Brick5: gfs02:/bricks/b02/vlm01
Brick6: gfs03:/bricks/b02/vlm01
Brick7: gfs01:/bricks/b03/vlm01
Brick8: gfs02:/bricks/b03/vlm01
Brick9: gfs03:/bricks/b03/vlm01
Brick10: gfs01:/bricks/b04/vlm01
Brick11: gfs02:/bricks/b04/vlm01
Brick12: gfs03:/bricks/b04/vlm01
Brick13: gfs01:/bricks/b05/vlm01
Brick14: gfs02:/bricks/b05/vlm01
Brick15: gfs03:/bricks/b05/vlm01
Brick16: gfs01:/bricks/b06/vlm01
Brick17: gfs02:/bricks/b06/vlm01
Brick18: gfs03:/bricks/b06/vlm01
Brick19: gfs01:/bricks/b07/vlm01
Brick20: gfs02:/bricks/b07/vlm01
Brick21: gfs03:/bricks/b07/vlm01
Options Reconfigured:
auth.allow: 192.168.221.50,192.168.221.51,192.168.221.52,192.168.221.56
features.shard: on
features.shard-block-size: 16MB
cluster.self-heal-window-size: 128
cluster.data-self-heal-algorithm: full
performance.write-behind: off
performance.strict-write-ordering: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
network.ping-timeout: 10
#####


Gluster status:
Status of volume: vlm01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gfs01:/bricks/b01/vlm01               49159     0          Y
3050
Brick gfs02:/bricks/b01/vlm01               49158     0          Y
3012
Brick gfs03:/bricks/b01/vlm01               49158     0          Y
3889
Brick gfs01:/bricks/b02/vlm01               49160     0          Y
3058
Brick gfs02:/bricks/b02/vlm01               49159     0          Y
3011
Brick gfs03:/bricks/b02/vlm01               49159     0          Y
3888
Brick gfs01:/bricks/b03/vlm01               49161     0          Y
3067
Brick gfs02:/bricks/b03/vlm01               49160     0          Y
3024
Brick gfs03:/bricks/b03/vlm01               49160     0          Y
3899
Brick gfs01:/bricks/b04/vlm01               49162     0          Y
3057
Brick gfs02:/bricks/b04/vlm01               49161     0          Y
3035
Brick gfs03:/bricks/b04/vlm01               49161     0          Y
3898
Brick gfs01:/bricks/b05/vlm01               49163     0          Y
3075
Brick gfs02:/bricks/b05/vlm01               49162     0          Y
3030
Brick gfs03:/bricks/b05/vlm01               49162     0          Y
3914
Brick gfs01:/bricks/b06/vlm01               49164     0          Y
3091
Brick gfs02:/bricks/b06/vlm01               49163     0          Y
3048
Brick gfs03:/bricks/b06/vlm01               49163     0          Y
3913
Brick gfs01:/bricks/b07/vlm01               49165     0          Y
3080
Brick gfs02:/bricks/b07/vlm01               49164     0          Y
3042
Brick gfs03:/bricks/b07/vlm01               49164     0          Y
3908
NFS Server on localhost                     2049      0          Y
28926
Self-heal Daemon on localhost               N/A       N/A        Y
28934
NFS Server on gfs02                         2049      0          Y
9944
Self-heal Daemon on gfs02                   N/A       N/A        Y
9953
NFS Server on gfs01                         2049      0          Y
46993
Self-heal Daemon on gfs01                   N/A       N/A        Y
47003

Task Status of Volume vlm01
------------------------------------------------------------------------------
There are no active volume tasks
#####


dmesg;
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: patchset: git://
git.gluster.com/glusterfs.git
Jul 23 09:53:07 gfs03 nfs[31243]: signal received: 11
Jul 23 09:53:07 gfs03 nfs[31243]: time of crash:
Jul 23 09:53:07 gfs03 nfs[31243]: 2016-07-23 06:53:07
Jul 23 09:53:07 gfs03 nfs[31243]: configuration details:
Jul 23 09:53:07 gfs03 nfs[31243]: argp 1
Jul 23 09:53:07 gfs03 nfs[31243]: backtrace 1
Jul 23 09:53:07 gfs03 nfs[31243]: dlfcn 1
Jul 23 09:53:07 gfs03 nfs[31243]: libpthread 1
Jul 23 09:53:07 gfs03 nfs[31243]: llistxattr 1
Jul 23 09:53:07 gfs03 nfs[31243]: setfsid 1
Jul 23 09:53:07 gfs03 nfs[31243]: spinlock 1
Jul 23 09:53:07 gfs03 nfs[31243]: epoll.h 1
Jul 23 09:53:07 gfs03 nfs[31243]: xattr.h 1
Jul 23 09:53:07 gfs03 nfs[31243]: st_atim.tv_nsec 1
Jul 23 09:53:07 gfs03 nfs[31243]: package-string: glusterfs 3.7.13
#####


nfs.log;
[2016-07-23 05:59:19.961654] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-18: Connected
to vlm01-client-18, attached to remote volume '/bricks/b07/vlm01'.
[2016-07-23 05:59:19.961670] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-18: Server
and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.961717] I [MSGID: 108005]
[afr-common.c:4142:afr_notify] 0-vlm01-replicate-6: Subvolume
'vlm01-client-18' came back up; going online.
[2016-07-23 05:59:19.961854] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-18:
Server lk version = 1
[2016-07-23 05:59:19.962027] I [rpc-clnt.c:1868:rpc_clnt_reconfig]
0-vlm01-client-20: changing port to 49164 (from 0)
[2016-07-23 05:59:19.964637] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-vlm01-client-19: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-23 05:59:19.965956] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-19: Connected
to vlm01-client-19, attached to remote volume '/bricks/b07/vlm01'.
[2016-07-23 05:59:19.965989] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-19: Server
and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.966140] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-19:
Server lk version = 1
[2016-07-23 05:59:19.967605] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-vlm01-client-20: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-23 05:59:19.967919] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-20: Connected
to vlm01-client-20, attached to remote volume '/bricks/b07/vlm01'.
[2016-07-23 05:59:19.967943] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-20: Server
and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.968107] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-20:
Server lk version = 1
[2016-07-23 05:59:19.973053] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-17: Connected
to vlm01-client-17, attached to remote volume '/bricks/b06/vlm01'.
[2016-07-23 05:59:19.973081] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-17: Server
and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.973582] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-17:
Server lk version = 1
[2016-07-23 05:59:19.974557] I [MSGID: 108031]
[afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-0: selecting
local read_child vlm01-client-2
[2016-07-23 05:59:19.976100] I [MSGID: 108031]
[afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-1: selecting
local read_child vlm01-client-5
[2016-07-23 05:59:19.976161] I [MSGID: 108031]
[afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-2: selecting
local read_child vlm01-client-8
[2016-07-23 05:59:19.976583] I [MSGID: 108031]
[afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-3: selecting
local read_child vlm01-client-11
[2016-07-23 05:59:19.976640] I [MSGID: 108031]
[afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-4: selecting
local read_child vlm01-client-14
[2016-07-23 05:59:19.976676] I [MSGID: 108031]
[afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-5: selecting
local read_child vlm01-client-17
[2016-07-23 05:59:19.976879] I [MSGID: 108031]
[afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-6: selecting
local read_child vlm01-client-20
[2016-07-23 05:59:36.360646] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx~
(hash=vlm01-replicate-0/cache=vlm01-replicate-0) =>
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx
(hash=vlm01-replicate-2/cache=vlm01-replicate-0)
[2016-07-23 05:59:36.962314] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15_1.vmdk~
(hash=vlm01-replicate-6/cache=vlm01-replicate-6) =>
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15_1.vmdk
(hash=vlm01-replicate-3/cache=vlm01-replicate-6)
[2016-07-23 05:59:37.019564] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx~
(hash=vlm01-replicate-0/cache=vlm01-replicate-0) =>
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx
(hash=vlm01-replicate-2/cache=vlm01-replicate-0)
The message "I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht:
renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx~
(hash=vlm01-replicate-0/cache=vlm01-replicate-0) =>
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx
(hash=vlm01-replicate-2/cache=vlm01-replicate-0)" repeated 2 times between
[2016-07-23 05:59:37.019564] and [2016-07-23 05:59:37.421227]
[2016-07-23 05:59:38.757822] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx~
(hash=vlm01-replicate-0/cache=vlm01-replicate-0) =>
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx
(hash=vlm01-replicate-2/cache=vlm01-replicate-0)
[2016-07-23 05:59:39.950960] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmdk~
(hash=vlm01-replicate-5/cache=vlm01-replicate-5) =>
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmdk
(hash=vlm01-replicate-5/cache=vlm01-replicate-5)
[2016-07-23 06:00:03.048266] I [MSGID: 109066]
[dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmdk~
(hash=vlm01-replicate-2/cache=vlm01-replicate-2) =>
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmdk
(hash=vlm01-replicate-5/cache=vlm01-replicate-2)
[2016-07-23 06:00:07.994953] W [MSGID: 112199]
[nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 4/PRTG 4.vmx => (XID:
8439cb9c, LOOKUP: NFS: 70(Invalid file handle), POSIX: 116(Stale file
handle)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid
00000000-0000-0000-0000-000000000000, mountid
00000000-0000-0000-0000-000000000000
[2016-07-23 06:01:02.831132] E [MSGID: 112069]
[nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: (
192.168.208.85:676) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364
[2016-07-23 06:16:48.221237] W [MSGID: 114031]
[client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-12: remote
operation failed. Path:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File
exists]
[2016-07-23 06:16:48.221231] W [MSGID: 114031]
[client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-13: remote
operation failed. Path:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File
exists]
[2016-07-23 06:16:48.221382] W [MSGID: 114031]
[client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-14: remote
operation failed. Path:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File
exists]
[2016-07-23 06:16:48.221878] W [MSGID: 112199]
[nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 => (XID:
8441a50a, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH:
exportid 00000000-0000-0000-0000-000000000000, gfid
00000000-0000-0000-0000-000000000000, mountid
00000000-0000-0000-0000-000000000000
[2016-07-23 06:17:11.343148] W [MSGID: 114031]
[client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-18: remote
operation failed. Path:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94
[File exists]
[2016-07-23 06:17:11.343170] W [MSGID: 114031]
[client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-20: remote
operation failed. Path:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94
[File exists]
[2016-07-23 06:17:11.343234] W [MSGID: 114031]
[client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-19: remote
operation failed. Path:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94
[File exists]
[2016-07-23 06:17:11.343596] W [MSGID: 112199]
[nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94
=> (XID: 51e43a2f, CREATE: NFS: 17(File exists), POSIX: 17(File exists)),
FH: exportid 00000000-0000-0000-0000-000000000000, gfid
00000000-0000-0000-0000-000000000000, mountid
00000000-0000-0000-0000-000000000000 [Invalid argument]
[2016-07-23 06:17:21.393996] E [MSGID: 112069]
[nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: (
192.168.208.86:906) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364
[2016-07-23 06:50:11.441462] W [MSGID: 114031]
[client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-19: remote
operation failed. Path:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0
[File exists]
[2016-07-23 06:50:11.441471] W [MSGID: 114031]
[client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-18: remote
operation failed. Path:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0
[File exists]
[2016-07-23 06:50:11.441530] W [MSGID: 114031]
[client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-20: remote
operation failed. Path:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0
[File exists]
[2016-07-23 06:50:11.441959] W [MSGID: 112199]
[nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3:
<gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0
=> (XID: 51ea9a6e, CREATE: NFS: 17(File exists), POSIX: 17(File exists)),
FH: exportid 00000000-0000-0000-0000-000000000000, gfid
00000000-0000-0000-0000-000000000000, mountid
00000000-0000-0000-0000-000000000000
[2016-07-23 06:50:21.712570] E [MSGID: 112069]
[nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: (
192.168.208.86:906) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2016-07-23 06:53:07
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.13
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f74cbde32f2]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f74cbe08aad]
/lib64/libc.so.6(+0x35670)[0x7f74ca4cf670]
/lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x7f74cac50210]


I appreciate your help guys.

Respectfully
*Mahdi A. Mahdi*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160723/1f7d9ee6/attachment.html>


More information about the Gluster-users mailing list