[Bugs] [Bug 1245565] Crash in dht_getxattr_cbk
bugzilla at redhat.com
bugzilla at redhat.com
Tue Aug 25 09:58:28 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1245565
Amit Chaurasia <achauras at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ON_QA |VERIFIED
--- Comment #6 from Amit Chaurasia <achauras at redhat.com> ---
Steps to reproduce this issue:
1. Create a dist volume with 10-12 bricks.
2. Set the multi-thread and client and server e-poll options to higher value
say 25 or 30.
3. Fuse mount the volume.
4. Create around 100 folders.
5. Set a custom user attribute using setfattr. Eg: setfattr -n user.random -v
yes <dir1> on all folders.
6. Perform a getfattr in a loop on multiple instances through multiple
terminal.
On 3.0.4, the crash seen was within 5 -10 mins.
[root at dht-rhs-19 glusterfs]# while true; do for i in `ls | grep dir`; do
getfattr -d -m . -e hex $i >/dev/null 2>&1; done; done
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
ls: cannot open directory .: Transport endpoint is not connected
[root at dht-rhs-19 glusterfs]# tail -100 /var/log/glusterfs/mnt-glusterfs-.log
.
.
.
+------------------------------------------------------------------------------+
[2015-08-25 14:58:41.906356] I [rpc-clnt.c:1759:rpc_clnt_reconfig]
0-testvol-client-11: changing port to 49157 (from 0)
[2015-08-25 14:58:41.907675] I
[client-handshake.c:1412:select_server_supported_programs] 0-testvol-client-10:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-08-25 14:58:41.907926] I [client-handshake.c:1200:client_setvolume_cbk]
0-testvol-client-10: Connected to testvol-client-10, attached to remote volume
'/bricks/brick5/testvol'.
[2015-08-25 14:58:41.907969] I [client-handshake.c:1210:client_setvolume_cbk]
0-testvol-client-10: Server and Client lk-version numbers are not same,
reopening the fds
[2015-08-25 14:58:41.908230] I
[client-handshake.c:187:client_set_lk_version_cbk] 0-testvol-client-10: Server
lk version = 1
[2015-08-25 14:58:41.912970] I
[client-handshake.c:1412:select_server_supported_programs] 0-testvol-client-11:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-08-25 14:58:41.914090] I [client-handshake.c:1200:client_setvolume_cbk]
0-testvol-client-11: Connected to testvol-client-11, attached to remote volume
'/bricks/brick5/testvol'.
[2015-08-25 14:58:41.914123] I [client-handshake.c:1210:client_setvolume_cbk]
0-testvol-client-11: Server and Client lk-version numbers are not same,
reopening the fds
[2015-08-25 14:58:41.921363] I [fuse-bridge.c:5042:fuse_graph_setup] 0-fuse:
switched to graph 0
[2015-08-25 14:58:41.921527] I
[client-handshake.c:187:client_set_lk_version_cbk] 0-testvol-client-11: Server
lk version = 1
[2015-08-25 14:58:41.922859] I [fuse-bridge.c:3971:fuse_init] 0-glusterfs-fuse:
FUSE inited with protocol versions: glusterfs 7.22 kernel 7.14
[2015-08-25 15:01:02.556929] E [mem-pool.c:242:__gf_free]
(-->/usr/lib64/glusterfs/3.6.0.53/xlator/protocol/client.so(client3_3_getxattr_cbk+0x1bd)
[0x7f154e2855ed] (-->/usr/lib64/libglusterfs.so.0(dict_destroy+0x3e)
[0x7f1559f42dae] (-->/usr/lib64/libglusterfs.so.0(data_destroy+0x55)
[0x7f1559f423e5]))) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == *(uint32_t
*)ptr
pending frames:
frame : type(1) op(GETXATTR)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-08-25 15:01:02
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.53
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7f1559f47c16]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7f1559f62daf]
/lib64/libc.so.6(+0x326a0)[0x7f15593706a0]
/usr/lib64/libglusterfs.so.0(__gf_free+0xf0)[0x7f1559f76810]
/usr/lib64/libglusterfs.so.0(data_destroy+0x55)[0x7f1559f423e5]
/usr/lib64/libglusterfs.so.0(dict_destroy+0x3e)[0x7f1559f42dae]
/usr/lib64/glusterfs/3.6.0.53/xlator/protocol/client.so(client3_3_getxattr_cbk+0x1bd)[0x7f154e2855ed]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f1559d1c895]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x142)[0x7f1559d1dd22]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f1559d194f8]
/usr/lib64/glusterfs/3.6.0.53/rpc-transport/socket.so(+0x92fd)[0x7f154f2d82fd]
/usr/lib64/glusterfs/3.6.0.53/rpc-transport/socket.so(+0xaded)[0x7f154f2d9ded]
/usr/lib64/libglusterfs.so.0(+0x79470)[0x7f1559fa0470]
/lib64/libpthread.so.0(+0x7a51)[0x7f15596d9a51]
/lib64/libc.so.6(clone+0x6d)[0x7f15594269ad]
[2015-08-25 15:01:02.557141] E [mem-pool.c:242:__gf_free]
(-->/usr/lib64/glusterfs/3.6.0.53/xlator/protocol/client.so(client3_3_getxattr_cbk+0x1bd)
[0x7f154e2855ed] (-->/usr/lib64/libglusterfs.so.0(dict_destroy+0x3e)
[0x7f1559f42dae] (-->/usr/lib64/libglusterfs.so.0(data_destroy+0x55)
[0x7f1559f423e5]))) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == *(uint32_t
*)ptr
---------
[root at dht-rhs-19 glusterfs]#
The setup is as below:
[root at dht-rhs-19 glusterfs]# gluster v info
Volume Name: testvol
Type: Distribute
Volume ID: 56f88953-cf50-43f8-8dc6-7a5d5ca644a6
Status: Started
Snap Volume: no
Number of Bricks: 12
Transport-type: tcp
Bricks:
Brick1: 10.70.47.98:/bricks/brick0/testvol
Brick2: 10.70.47.99:/bricks/brick0/testvol
Brick3: 10.70.47.98:/bricks/brick1/testvol
Brick4: 10.70.47.99:/bricks/brick1/testvol
Brick5: 10.70.47.98:/bricks/brick2/testvol
Brick6: 10.70.47.99:/bricks/brick2/testvol
Brick7: 10.70.47.98:/bricks/brick3/testvol
Brick8: 10.70.47.99:/bricks/brick3/testvol
Brick9: 10.70.47.98:/bricks/brick4/testvol
Brick10: 10.70.47.99:/bricks/brick4/testvol
Brick11: 10.70.47.98:/bricks/brick5/testvol
Brick12: 10.70.47.99:/bricks/brick5/testvol
Options Reconfigured:
performance.readdir-ahead: on
client.event-threads: 30
server.event-threads: 30
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root at dht-rhs-19 glusterfs]#
=====================================================
[root at dht-rhs-19 glusterfs]# rpm -qa | grep -i gluster
glusterfs-devel-3.6.0.53-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.53-1.el6rhs.x86_64
glusterfs-libs-3.6.0.53-1.el6rhs.x86_64
glusterfs-3.6.0.53-1.el6rhs.x86_64
glusterfs-fuse-3.6.0.53-1.el6rhs.x86_64
glusterfs-server-3.6.0.53-1.el6rhs.x86_64
glusterfs-api-devel-3.6.0.53-1.el6rhs.x86_64
glusterfs-debuginfo-3.6.0.53-1.el6rhs.x86_64
glusterfs-api-3.6.0.53-1.el6rhs.x86_64
glusterfs-cli-3.6.0.53-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.53-1.el6rhs.x86_64
[root at dht-rhs-19 glusterfs]#
=====================================================
[root at dht-rhs-19 glusterfs]# gluster v status all
Status of volume: testvol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.47.98:/bricks/brick0/testvol 49152 0 Y 1974
Brick 10.70.47.99:/bricks/brick0/testvol 49152 0 Y 2069
Brick 10.70.47.98:/bricks/brick1/testvol 49153 0 Y 1971
Brick 10.70.47.99:/bricks/brick1/testvol 49153 0 Y 2081
Brick 10.70.47.98:/bricks/brick2/testvol 49154 0 Y 1984
Brick 10.70.47.99:/bricks/brick2/testvol 49154 0 Y 2089
Brick 10.70.47.98:/bricks/brick3/testvol 49155 0 Y 1990
Brick 10.70.47.99:/bricks/brick3/testvol 49155 0 Y 2099
Brick 10.70.47.98:/bricks/brick4/testvol 49156 0 Y 2002
Brick 10.70.47.99:/bricks/brick4/testvol 49156 0 Y 2106
Brick 10.70.47.98:/bricks/brick5/testvol 49157 0 Y 2003
Brick 10.70.47.99:/bricks/brick5/testvol 49157 0 Y 2107
NFS Server on localhost 2049 0 Y 1964
NFS Server on 10.70.47.99 2049 0 Y 2067
Task Status of Volume testvol
------------------------------------------------------------------------------
There are no active volume tasks
[root at dht-rhs-19 glusterfs]#
===========
On 3.1.1
[root at dht-rhs-23 glusterfs]# while true
> do
> for i in `ls | grep dir`; do getfattr -d -m . e hex $i >/dev/null 2>&1 ; done
> done
The above script ran for more than half an hour without any error.
The setup:
[root at dht-rhs-23 glusterfs]# gluster v info
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: d960bc60-17af-4794-839b-9ab6da0f9321
Status: Started
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.47.114:/bricks/brick0/testvol
Brick2: 10.70.47.174:/bricks/brick0/testvol
Brick3: 10.70.47.114:/bricks/brick1/testvol
Brick4: 10.70.47.174:/bricks/brick1/testvol
Brick5: 10.70.47.114:/bricks/brick2/testvol
Brick6: 10.70.47.174:/bricks/brick2/testvol
Brick7: 10.70.47.114:/bricks/brick3/testvol
Brick8: 10.70.47.174:/bricks/brick3/testvol
Brick9: 10.70.47.114:/bricks/brick4/testvol
Brick10: 10.70.47.174:/bricks/brick4/testvol
Brick11: 10.70.47.114:/bricks/brick5/testvol
Brick12: 10.70.47.174:/bricks/brick5/testvol
Brick13: 10.70.47.114:/bricks/brick6/testvol
Brick14: 10.70.47.174:/bricks/brick6/testvol
Options Reconfigured:
server.event-threads: 30
client.event-threads: 30
performance.readdir-ahead: on
[root at dht-rhs-23 glusterfs]#
===
[root at dht-rhs-23 glusterfs]# rpm -qa | grep -i gluster
glusterfs-geo-replication-3.7.1-12.el7rhgs.x86_64
glusterfs-rdma-3.7.1-12.el7rhgs.x86_64
gluster-nagios-addons-0.2.4-4.el7rhgs.x86_64
glusterfs-libs-3.7.1-12.el7rhgs.x86_64
glusterfs-cli-3.7.1-12.el7rhgs.x86_64
glusterfs-client-xlators-3.7.1-12.el7rhgs.x86_64
glusterfs-api-3.7.1-12.el7rhgs.x86_64
vdsm-gluster-4.16.20-1.2.el7rhgs.noarch
glusterfs-fuse-3.7.1-12.el7rhgs.x86_64
gluster-nagios-common-0.2.0-2.el7rhgs.noarch
glusterfs-3.7.1-12.el7rhgs.x86_64
glusterfs-server-3.7.1-12.el7rhgs.x86_64
[root at dht-rhs-23 glusterfs]#
Marking the bug verified.
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=7igd8nwCpA&a=cc_unsubscribe
More information about the Bugs
mailing list