[Bugs] [Bug 1802947] list about 550 files in replicated volume will causes glfs_iotwr thread crash

Wed Feb 19 03:17:26 UTC 2020

https://bugzilla.redhat.com/show_bug.cgi?id=1802947

--- Comment #2 from Liguang Li <liguang_li at 126.com> ---
This issue can reproduce easily on v6.4 as you steps.

root at 128:/# gluster --version
glusterfs 6.4

root at 128:/# gdb /usr/sbin/glusterfsd ./core.638
...
Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.141 --volfile-id
gv0.128.224.95.141.tmp-bric'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00003fff9f5201c4 in _int_free (av=0x3fff88000020, p=0x3fff880092f0,
have_lock=0) at malloc.c:3846
3846    {
[Current thread is 1 (Thread 0x3fff99390440 (LWP 648))]
(gdb) bt
#0  0x00003fff9f5201c4 in _int_free (av=0x3fff88000020, p=0x3fff880092f0,
have_lock=0) at malloc.c:3846
#1  0x00003fff9f5dfc74 in x_inline (xdrs=<optimized out>, len=<optimized out>)
at xdr_sizeof.c:88
#2  0x00003fff9f6bd4e8 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0
#3  0x00003fff9f6bdee4 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0
#4  0x00003fff9f5df8d8 in __GI_xdr_reference (xdrs=0x3fff9938e040,
pp=0x3fff880eacf0, size=<optimized out>, proc=<optimized out>) at xdr_ref.c:84
#5  0x00003fff9f5dfab4 in __GI_xdr_pointer (xdrs=0x3fff9938e040,
objpp=0x3fff880eacf0, obj_size=<optimized out>,
...
#1642 0x00003fff9f79a3d4 in .call_resume () from /usr/lib64/libglusterfs.so.0
#1643 0x00003fff9a07e948 in ?? () from
/usr/lib64/glusterfs/6.4/xlator/performance/io-threads.so
#1644 0x00003fff9f654b30 in start_thread (arg=0x3fff99390440) at
pthread_create.c:462
(gdb) frame 1644
#1644 0x00003fff9f654b30 in start_thread (arg=0x3fff99390440) at
pthread_create.c:462
462           THREAD_SETMEM (pd, result, pd->start_routine (pd->arg));
(gdb) p/x $r1
$1 = 0x3fff9938fa20
(gdb) frame 0
#0  0x00003fff9f5201c4 in _int_free (av=0x3fff88000020, p=0x3fff880092f0,
have_lock=0) at malloc.c:3846
3846    {
(gdb) p/x $r1
$2 = 0x3fff99353080
(gdb) p $1 - $2
$3 = 248224
(gdb) disassemble
Dump of assembler code for function _int_free:
   0x00003fff903f0160 <+0>:     mflr    r0
   0x00003fff903f0164 <+4>:     std     r30,-16(r1)
   0x00003fff903f0168 <+8>:     std     r0,16(r1)
   0x00003fff903f016c <+12>:    mfcr    r12
   0x00003fff903f0170 <+16>:    std     r29,-24(r1)
   0x00003fff903f0174 <+20>:    mr      r29,r3
   0x00003fff903f0178 <+24>:    std     r31,-8(r1)
   0x00003fff903f017c <+28>:    mr      r31,r4
   0x00003fff903f0180 <+32>:    ld      r10,8(r4)
   0x00003fff903f0184 <+36>:    std     r17,-120(r1)
   0x00003fff903f0188 <+40>:    std     r18,-112(r1)
   0x00003fff903f018c <+44>:    rldicr  r30,r10,0,60
   0x00003fff903f0190 <+48>:    std     r19,-104(r1)
   0x00003fff903f0194 <+52>:    neg     r9,r30
   0x00003fff903f0198 <+56>:    std     r20,-96(r1)
   0x00003fff903f019c <+60>:    cmpld   cr7,r4,r9
   0x00003fff903f01a0 <+64>:    std     r21,-88(r1)
   0x00003fff903f01a4 <+68>:    std     r22,-80(r1)
   0x00003fff903f01a8 <+72>:    std     r23,-72(r1)
   0x00003fff903f01ac <+76>:    std     r24,-64(r1)
   0x00003fff903f01b0 <+80>:    std     r25,-56(r1)
   0x00003fff903f01b4 <+84>:    std     r26,-48(r1)
   0x00003fff903f01b8 <+88>:    std     r27,-40(r1)
   0x00003fff903f01bc <+92>:    std     r28,-32(r1)
   0x00003fff903f01c0 <+96>:    stw     r12,8(r1)
=> 0x00003fff903f01c4 <+100>:   stdu    r1,-256(r1)

Please notes, we are using a powerpc machine. From the stack pointer register
in frame 1644 and 0, we know 248224 bytes have been used in the stack of the
thread.

>From the assemble instructions, we know the crash happens in the "stdu
r1,-256(r1)" instruction, so i guess there is a stack overflow. 

We know the stack size of the thread is 256K from the source code, can i fix
this crash by increasing the stack size.

-- 
You are receiving this mail because:
You are on the CC list for the bug.