[Bugs] [Bug 1245565] Crash in dht_getxattr_cbk

Tue Aug 25 09:58:28 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1245565

Amit Chaurasia <achauras at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ON_QA                       |VERIFIED

--- Comment #6 from Amit Chaurasia <achauras at redhat.com> ---
Steps to reproduce this issue:

1. Create a dist volume with 10-12 bricks.
2. Set the multi-thread and client and server e-poll options to higher value
say 25 or 30.
3. Fuse mount the volume.
4. Create around 100 folders.
5. Set a custom user attribute using setfattr. Eg: setfattr -n user.random -v
yes <dir1> on all folders.
6. Perform a getfattr in a loop on multiple instances through multiple
terminal.

On 3.0.4, the crash seen was within 5 -10 mins.

[root at dht-rhs-19 glusterfs]# while true; do for i in `ls | grep dir`; do
getfattr -d -m . -e hex $i >/dev/null 2>&1; done; done
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected

[root at dht-rhs-19 glusterfs]# tail -100 /var/log/glusterfs/mnt-glusterfs-.log
.
.
.
+------------------------------------------------------------------------------+
[2015-08-25 14:58:41.906356] I [rpc-clnt.c:1759:rpc_clnt_reconfig]
0-testvol-client-11: changing port to 49157 (from 0)
[2015-08-25 14:58:41.907675] I
[client-handshake.c:1412:select_server_supported_programs] 0-testvol-client-10:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-08-25 14:58:41.907926] I [client-handshake.c:1200:client_setvolume_cbk]
0-testvol-client-10: Connected to testvol-client-10, attached to remote volume
'/bricks/brick5/testvol'.
[2015-08-25 14:58:41.907969] I [client-handshake.c:1210:client_setvolume_cbk]
0-testvol-client-10: Server and Client lk-version numbers are not same,
reopening the fds
[2015-08-25 14:58:41.908230] I
[client-handshake.c:187:client_set_lk_version_cbk] 0-testvol-client-10: Server
lk version = 1
[2015-08-25 14:58:41.912970] I
[client-handshake.c:1412:select_server_supported_programs] 0-testvol-client-11:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-08-25 14:58:41.914090] I [client-handshake.c:1200:client_setvolume_cbk]
0-testvol-client-11: Connected to testvol-client-11, attached to remote volume
'/bricks/brick5/testvol'.
[2015-08-25 14:58:41.914123] I [client-handshake.c:1210:client_setvolume_cbk]
0-testvol-client-11: Server and Client lk-version numbers are not same,
reopening the fds
[2015-08-25 14:58:41.921363] I [fuse-bridge.c:5042:fuse_graph_setup] 0-fuse:
switched to graph 0
[2015-08-25 14:58:41.921527] I
[client-handshake.c:187:client_set_lk_version_cbk] 0-testvol-client-11: Server
lk version = 1
[2015-08-25 14:58:41.922859] I [fuse-bridge.c:3971:fuse_init] 0-glusterfs-fuse:
FUSE inited with protocol versions: glusterfs 7.22 kernel 7.14
[2015-08-25 15:01:02.556929] E [mem-pool.c:242:__gf_free]
(-->/usr/lib64/glusterfs/3.6.0.53/xlator/protocol/client.so(client3_3_getxattr_cbk+0x1bd)
[0x7f154e2855ed] (-->/usr/lib64/libglusterfs.so.0(dict_destroy+0x3e)
[0x7f1559f42dae] (-->/usr/lib64/libglusterfs.so.0(data_destroy+0x55)
[0x7f1559f423e5]))) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == *(uint32_t
*)ptr
pending frames:
frame : type(1) op(GETXATTR)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-08-25 15:01:02
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.53
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7f1559f47c16]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7f1559f62daf]
/lib64/libc.so.6(+0x326a0)[0x7f15593706a0]
/usr/lib64/libglusterfs.so.0(__gf_free+0xf0)[0x7f1559f76810]
/usr/lib64/libglusterfs.so.0(data_destroy+0x55)[0x7f1559f423e5]
/usr/lib64/libglusterfs.so.0(dict_destroy+0x3e)[0x7f1559f42dae]
/usr/lib64/glusterfs/3.6.0.53/xlator/protocol/client.so(client3_3_getxattr_cbk+0x1bd)[0x7f154e2855ed]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f1559d1c895]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x142)[0x7f1559d1dd22]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f1559d194f8]
/usr/lib64/glusterfs/3.6.0.53/rpc-transport/socket.so(+0x92fd)[0x7f154f2d82fd]
/usr/lib64/glusterfs/3.6.0.53/rpc-transport/socket.so(+0xaded)[0x7f154f2d9ded]
/usr/lib64/libglusterfs.so.0(+0x79470)[0x7f1559fa0470]
/lib64/libpthread.so.0(+0x7a51)[0x7f15596d9a51]
/lib64/libc.so.6(clone+0x6d)[0x7f15594269ad]
[2015-08-25 15:01:02.557141] E [mem-pool.c:242:__gf_free]
(-->/usr/lib64/glusterfs/3.6.0.53/xlator/protocol/client.so(client3_3_getxattr_cbk+0x1bd)
[0x7f154e2855ed] (-->/usr/lib64/libglusterfs.so.0(dict_destroy+0x3e)
[0x7f1559f42dae] (-->/usr/lib64/libglusterfs.so.0(data_destroy+0x55)
[0x7f1559f423e5]))) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == *(uint32_t
*)ptr
---------
[root at dht-rhs-19 glusterfs]# 

The setup is as below:
[root at dht-rhs-19 glusterfs]# gluster v info

Volume Name: testvol
Type: Distribute
Volume ID: 56f88953-cf50-43f8-8dc6-7a5d5ca644a6
Status: Started
Snap Volume: no
Number of Bricks: 12
Transport-type: tcp
Bricks:
Brick1: 10.70.47.98:/bricks/brick0/testvol
Brick2: 10.70.47.99:/bricks/brick0/testvol
Brick3: 10.70.47.98:/bricks/brick1/testvol
Brick4: 10.70.47.99:/bricks/brick1/testvol
Brick5: 10.70.47.98:/bricks/brick2/testvol
Brick6: 10.70.47.99:/bricks/brick2/testvol
Brick7: 10.70.47.98:/bricks/brick3/testvol
Brick8: 10.70.47.99:/bricks/brick3/testvol
Brick9: 10.70.47.98:/bricks/brick4/testvol
Brick10: 10.70.47.99:/bricks/brick4/testvol
Brick11: 10.70.47.98:/bricks/brick5/testvol
Brick12: 10.70.47.99:/bricks/brick5/testvol
Options Reconfigured:
performance.readdir-ahead: on
client.event-threads: 30
server.event-threads: 30
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root at dht-rhs-19 glusterfs]# 

=====================================================

[root at dht-rhs-19 glusterfs]# rpm -qa | grep -i gluster
glusterfs-devel-3.6.0.53-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.53-1.el6rhs.x86_64
glusterfs-libs-3.6.0.53-1.el6rhs.x86_64
glusterfs-3.6.0.53-1.el6rhs.x86_64
glusterfs-fuse-3.6.0.53-1.el6rhs.x86_64
glusterfs-server-3.6.0.53-1.el6rhs.x86_64
glusterfs-api-devel-3.6.0.53-1.el6rhs.x86_64
glusterfs-debuginfo-3.6.0.53-1.el6rhs.x86_64
glusterfs-api-3.6.0.53-1.el6rhs.x86_64
glusterfs-cli-3.6.0.53-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.53-1.el6rhs.x86_64
[root at dht-rhs-19 glusterfs]# 

=====================================================
[root at dht-rhs-19 glusterfs]# gluster v status all
Status of volume: testvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.98:/bricks/brick0/testvol    49152     0          Y       1974 
Brick 10.70.47.99:/bricks/brick0/testvol    49152     0          Y       2069 
Brick 10.70.47.98:/bricks/brick1/testvol    49153     0          Y       1971 
Brick 10.70.47.99:/bricks/brick1/testvol    49153     0          Y       2081 
Brick 10.70.47.98:/bricks/brick2/testvol    49154     0          Y       1984 
Brick 10.70.47.99:/bricks/brick2/testvol    49154     0          Y       2089 
Brick 10.70.47.98:/bricks/brick3/testvol    49155     0          Y       1990 
Brick 10.70.47.99:/bricks/brick3/testvol    49155     0          Y       2099 
Brick 10.70.47.98:/bricks/brick4/testvol    49156     0          Y       2002 
Brick 10.70.47.99:/bricks/brick4/testvol    49156     0          Y       2106 
Brick 10.70.47.98:/bricks/brick5/testvol    49157     0          Y       2003 
Brick 10.70.47.99:/bricks/brick5/testvol    49157     0          Y       2107 
NFS Server on localhost                     2049      0          Y       1964 
NFS Server on 10.70.47.99                   2049      0          Y       2067 

Task Status of Volume testvol
------------------------------------------------------------------------------
There are no active volume tasks

[root at dht-rhs-19 glusterfs]# 

===========

On 3.1.1

[root at dht-rhs-23 glusterfs]# while true
> do
> for i in `ls | grep dir`; do getfattr -d -m . e hex $i >/dev/null 2>&1 ; done
> done

The above script ran for more than half an hour without any error.

The setup:

[root at dht-rhs-23 glusterfs]# gluster v info

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: d960bc60-17af-4794-839b-9ab6da0f9321
Status: Started
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.47.114:/bricks/brick0/testvol
Brick2: 10.70.47.174:/bricks/brick0/testvol
Brick3: 10.70.47.114:/bricks/brick1/testvol
Brick4: 10.70.47.174:/bricks/brick1/testvol
Brick5: 10.70.47.114:/bricks/brick2/testvol
Brick6: 10.70.47.174:/bricks/brick2/testvol
Brick7: 10.70.47.114:/bricks/brick3/testvol
Brick8: 10.70.47.174:/bricks/brick3/testvol
Brick9: 10.70.47.114:/bricks/brick4/testvol
Brick10: 10.70.47.174:/bricks/brick4/testvol
Brick11: 10.70.47.114:/bricks/brick5/testvol
Brick12: 10.70.47.174:/bricks/brick5/testvol
Brick13: 10.70.47.114:/bricks/brick6/testvol
Brick14: 10.70.47.174:/bricks/brick6/testvol
Options Reconfigured:
server.event-threads: 30
client.event-threads: 30
performance.readdir-ahead: on
[root at dht-rhs-23 glusterfs]# 

===

[root at dht-rhs-23 glusterfs]# rpm -qa | grep -i gluster
glusterfs-geo-replication-3.7.1-12.el7rhgs.x86_64
glusterfs-rdma-3.7.1-12.el7rhgs.x86_64
gluster-nagios-addons-0.2.4-4.el7rhgs.x86_64
glusterfs-libs-3.7.1-12.el7rhgs.x86_64
glusterfs-cli-3.7.1-12.el7rhgs.x86_64
glusterfs-client-xlators-3.7.1-12.el7rhgs.x86_64
glusterfs-api-3.7.1-12.el7rhgs.x86_64
vdsm-gluster-4.16.20-1.2.el7rhgs.noarch
glusterfs-fuse-3.7.1-12.el7rhgs.x86_64
gluster-nagios-common-0.2.0-2.el7rhgs.noarch
glusterfs-3.7.1-12.el7rhgs.x86_64
glusterfs-server-3.7.1-12.el7rhgs.x86_64
[root at dht-rhs-23 glusterfs]# 

Marking the bug verified.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=7igd8nwCpA&a=cc_unsubscribe