[Bugs] [Bug 1626085] New: "glusterfs --process-name fuse" crashes and leads to " Transport endpoint is not connected"

Thu Sep 6 14:38:38 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1626085

            Bug ID: 1626085
           Summary: "glusterfs --process-name fuse" crashes and leads to
                    "Transport endpoint is not connected"
           Product: GlusterFS
           Version: 4.1
         Component: fuse
          Assignee: bugs at gluster.org
          Reporter: omar.kohl at iternity.com
                CC: bugs at gluster.org

Gluster version: 4.1.2
OS: SUSE Linux Enterprise 15
Number of nodes: 4 (node1, node2, node3, node4)
Replica of volume: 4
Name of volume: testvol
Affected node: node1

Description of problem:
"glusterfs fuse" process on node1 died after executing "df -B1
/gluster/volumes/testvol" on node1 where /gluster/volumes/testvol is the
mountpoint of the volume.

The exact command is probably not relevant because it is part of an automated
test that is executed very frequently (usually with success).

Trying to access the mount leads to following error:

$ ls /gluster/volumes/testvol
ls: cannot access '/gluster/volumes/testvol': Transport endpoint is not
connected

/etc/fstab:

localhost:/testvol /gluster/volumes/testvol glusterfs
defaults,_netdev,noauto,x-systemd.automount,aux-gfid-mount 0 0

The "glusterfs" process for this volume on node1 is missing. On all other nodes
of the cluster it is present:

/usr/sbin/glusterfs --aux-gfid-mount --process-name fuse
--volfile-server=localhost --volfile-id=/testvol /gluster/volumes/testvol

The log file gluster-volumes-testvol.log contains the following as the last
entry. The time stamp matches the time the test failed:

============ START of log ==================

pending frames:
frame : type(1) op(STATFS)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash: 
2018-09-06 03:46:50
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.2
/usr/lib64/libglusterfs.so.0(+0x26ecc)[0x7fce55895ecc]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fce5589fef6]
/lib64/libc.so.6(+0x36160)[0x7fce54c90160]
/lib64/libc.so.6(gsignal+0x110)[0x7fce54c900e0]
/lib64/libc.so.6(abort+0x151)[0x7fce54c916c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fce54c886fa]
/lib64/libc.so.6(+0x2e772)[0x7fce54c88772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fce5501e0b8]
/usr/lib64/glusterfs/4.1.2/xlator/cluster/replicate.so(+0x642c5)[0x7fce4f1b32c5]
/usr/lib64/glusterfs/4.1.2/xlator/protocol/client.so(+0x6657f)[0x7fce4f44d57f]
/usr/lib64/libgfrpc.so.0(+0xe840)[0x7fce55661840]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fce55661b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fce5565dff3]
/usr/lib64/glusterfs/4.1.2/rpc-transport/socket.so(+0x73ca)[0x7fce509253ca]
/usr/lib64/glusterfs/4.1.2/rpc-transport/socket.so(+0x9d02)[0x7fce50927d02]
/usr/lib64/libglusterfs.so.0(+0x845c7)[0x7fce558f35c7]
/lib64/libpthread.so.0(+0x7559)[0x7fce5501b559]
/lib64/libc.so.6(clone+0x3f)[0x7fce54d5282f]
---------

============== End of log ==============

All other log files contain no relevant information as far as I can see. Let me
know if you would like any of them anyway.

Unmounting and then mounting the volume again does NOT work.

Executing "umount /gluster/volumes/testvol" and then "
/usr/sbin/glusterfs --aux-gfid-mount --process-name fuse
--volfile-server=localhost --volfile-id=/testvol /gluster/volumes/testvol" did
work. The mount is up an running again.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.