[Gluster-users] Bricks crashing in 3.7.1

Wed Jun 10 08:07:50 UTC 2015

Hi,
yesterday I’ve got a strange crash on almost all bricks, same type of crash, repeated:

[2015-06-09 18:23:56.407520] I [login.c:81:gf_auth] 0-auth/login: allowed user names: c3deedb5-893f-41fb-8c33-9ae23a0e9d27
[2015-06-09 18:23:56.407580] I [server-handshake.c:585:server_setvolume] 0-atlas-data-01-server: accepted client from atlas-storage-10.roma1.infn.it-7546-2015/06/09-18:23:55:618600-atlas-data-01-client-0-0-0 (version: 3.7.1)
[2015-06-09 18:23:56.407707] I [login.c:81:gf_auth] 0-auth/login: allowed user names: c3deedb5-893f-41fb-8c33-9ae23a0e9d27
[2015-06-09 18:23:56.407772] I [server-handshake.c:585:server_setvolume] 0-atlas-data-01-server: accepted client from atlas-storage-09.roma1.infn.it-25429-2015/06/09-18:18:57:328935-atlas-data-01-client-0-0-0 (version: 3.7.1)
[2015-06-09 18:23:56.415905] I [login.c:81:gf_auth] 0-auth/login: allowed user names: c3deedb5-893f-41fb-8c33-9ae23a0e9d27
[2015-06-09 18:23:56.415947] I [server-handshake.c:585:server_setvolume] 0-atlas-data-01-server: accepted client from atlas-storage-10.roma1.infn.it-7530-2015/06/09-18:23:54:608880-atlas-data-01-client-0-0-0 (version: 3.7.1)
[2015-06-09 18:23:56.433956] E [posix-handle.c:157:posix_make_ancestryfromgfid] 0-atlas-data-01-posix: could not read the link from the gfid handle /bricks/atlas/data01/data/.glusterfs/74/4b/744b7cf0-258f-4dea-b4d9-7001bb21ca56 (No such file or directory)
[2015-06-09 18:23:56.433954] E [posix-handle.c:157:posix_make_ancestryfromgfid] 0-atlas-data-01-posix: could not read the link from the gfid handle /bricks/atlas/data01/data/.glusterfs/74/4b/744b7cf0-258f-4dea-b4d9-7001bb21ca56 (No such file or directory)
pending frames:
frame : type(0) op(11)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-06-09 18:23:56
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.1
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f0f6446ed92]
/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7f0f644899ed]
/lib64/libc.so.6(+0x35650)[0x7f0f62e60650]
/usr/lib64/glusterfs/3.7.1/xlator/features/upcall.so(upcall_cache_invalidate+0xb5)[0x7f0f5537cab5]
/usr/lib64/glusterfs/3.7.1/xlator/features/upcall.so(up_readdir_cbk+0x1a2)[0x7f0f55376292]
/usr/lib64/glusterfs/3.7.1/xlator/features/locks.so(pl_readdirp_cbk+0x164)[0x7f0f5558dc94]
/usr/lib64/glusterfs/3.7.1/xlator/features/access-control.so(posix_acl_readdirp_cbk+0x299)[0x7f0f557a6829]
/usr/lib64/glusterfs/3.7.1/xlator/features/bitrot-stub.so(br_stub_readdirp_cbk+0x181)[0x7f0f559b5fb1]
/usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so(posix_readdirp+0x143)[0x7f0f56f0cfc3]
/lib64/libglusterfs.so.0(default_readdirp+0x75)[0x7f0f644736a5]
/lib64/libglusterfs.so.0(default_readdirp+0x75)[0x7f0f644736a5]
/lib64/libglusterfs.so.0(default_readdirp+0x75)[0x7f0f644736a5]
/usr/lib64/glusterfs/3.7.1/xlator/features/bitrot-stub.so(br_stub_readdirp+0x246)[0x7f0f559b0d46]
/usr/lib64/glusterfs/3.7.1/xlator/features/access-control.so(posix_acl_readdirp+0x18d)[0x7f0f557a45cd]
/usr/lib64/glusterfs/3.7.1/xlator/features/locks.so(pl_readdirp+0x14e)[0x7f0f5558c7ee]
/usr/lib64/glusterfs/3.7.1/xlator/features/upcall.so(up_readdirp+0x17a)[0x7f0f5537abfa]
/lib64/libglusterfs.so.0(default_readdirp_resume+0x134)[0x7f0f644809e4]
/lib64/libglusterfs.so.0(call_resume+0x7d)[0x7f0f64498c7d]
/usr/lib64/glusterfs/3.7.1/xlator/performance/io-threads.so(iot_worker+0x123)[0x7f0f5516b353]
/lib64/libpthread.so.0(+0x7df5)[0x7f0f635dadf5]
/lib64/libc.so.6(clone+0x6d)[0x7f0f62f211ad]
---------

I’m not sure if the missing file is the culprit, but if it is the cause, how can I solve it? For the moment I’ve recreated the bricks from a backup, so I’m fine, but it would be nice to understand what to do in case it happens again. I still have the contents of the old crashed brick.
The crash was happening every time I restarted glusterd, in the same way.
I’m using gluster 3.7.1 on CentOS 7.1, with the following kind of configuration:

# gluster volume info atlas-data-01

Volume Name: atlas-data-01
Type: Replicate
Volume ID: 854620a1-3e88-4e76-91ce-486996bf6a12
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/bricks/atlas/data01/data
Brick2: node2:/bricks/atlas/data01/data
Brick3: node3:/bricks/atlas/data02/data
Options Reconfigured:
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
nfs.disable: true
server.allow-insecure: on
ganesha.enable: off
nfs-ganesha: disable

I was playing with ganesha and tried to enable in on the volumes (but failed, as you can see from my other messages), and I’m not sure it is related, but all the crashed bricks were the ones belonging to the volumes where I tried to enable ganesha.
Thanks,

	Alessandro
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1770 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150610/f362933b/attachment.p7s>