[Bugs] [Bug 1761365] New: libgfapi: the glfs_init() get stuck and is in inifinitely loop in pthread_spin_lock()

Mon Oct 14 09:19:54 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1761365

            Bug ID: 1761365
           Summary: libgfapi: the glfs_init() get stuck and is in
                    inifinitely loop in pthread_spin_lock()
           Product: GlusterFS
           Version: 7
            Status: NEW
         Component: libgfapi
          Assignee: bugs at gluster.org
          Reporter: xiubli at redhat.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community

Description of problem:

I am now testing the gfapi stuff based on the gluster-block/tcmu-runner, and
hit one problem that the tcmu-runner process is running in almost 100% cpu and
get stuck when creating the gluster-block device:

The gluster-block command is:
[root at localhost tcmu-runner]# gluster-block create repvol/block ha 2 prealloc
full 10.70.39.238,10.70.39.231 1G

[root at localhost tcmu-runner]# top
top - 14:14:50 up  1:07,  2 users,  load average: 2.06, 1.89, 1.17
Tasks: 116 total,   2 running, 114 sleeping,   0 stopped,   0 zombie
%Cpu(s): 50.0 us,  3.1 sy,  0.0 ni, 46.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1990.4 total,    853.9 free,    270.9 used,    865.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1560.8 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
 8020 root      20   0 2412664  35916  21792 R  93.8   1.8  12:39.00
tcmu-runner                                                                     
    1 root      20   0  108892  15540   9472 S   0.0   0.8   0:02.44 systemd    
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd   
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp     

[root at localhost tcmu-runner]# perf top -p 8020
Samples: 7K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.):
1838750000 lost: 0/0 drop: 0/0                                                  
Overhead  Shared Object       Symbol                                            
  99.95%  libpthread-2.29.so  [.] pthread_spin_lock
   0.01%  [kernel]            [k] __ip_queue_xmit
   0.01%  [kernel]            [k] __softirqentry_text_start
   0.01%  [kernel]            [k] _raw_spin_unlock_irqrestore
   0.01%  [kernel]            [k] run_rebalance_domains

[root at localhost tcmu-runner]# pstack 8020
Thread 17 (Thread 0x7f709a7fc700 (LWP 11351)):
...
Thread 1 (Thread 0x7f7128527880 (LWP 8020)):
#0  0x00007f7128d4f2b5 in pthread_spin_lock () at /lib64/libpthread.so.0
#1  0x00007f7126ba6eba in mem_get () at /lib64/libglusterfs.so.0
#2  0x00007f7126ba6fdd in mem_get0 () at /lib64/libglusterfs.so.0
#3  0x00007f7126b6e004 in get_new_dict_full () at /lib64/libglusterfs.so.0
#4  0x00007f7126b6f9f0 in dict_new () at /lib64/libglusterfs.so.0
#5  0x00007f7126ce9e38 in glfs_init_common () at /lib64/libgfapi.so.0
#6  0x00007f7126cea030 in glfs_init () at /lib64/libgfapi.so.0
#7  0x00007f7126d20332 in tcmu_glfs_unlock () at
/usr/lib64/tcmu-runner/handler_glfs.so
[root at localhost tcmu-runner]# 

It is infinitely looping in the libglusterfs.so .......

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. install gluterfs-7.0 packages from
https://download.gluster.org/pub/gluster/glusterfs/qa-releases/7.0rc3/Fedora/fedora-30/x86_64/
and the tcmu-runner/gluster-block from source.

2. enable and start glusterd/tcmu-runner/gluster-blockd services.

3. create one replicate volume: 
# gluster vol create repvol replica 2 10.70.39.238:/data/repvol
10.70.39.231:/data/repvol force

# gluster vol set repvol group gluster-block

# gluster vol start repvol

# gluster volume set repvol locks.mandatory-locking forced

#gluster volume set repvol  enforce-mandatory-lock on

#gluster volume set repvol  performance.client-io-threads off

4. then create the gluster-block device by using:

#  gluster-block create repvol/block ha 2 prealloc full
10.70.39.238,10.70.39.231 1G

5. it will be stuck in Step4.

Actual results:

Expected results:

Additional info:

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.