[GEDI] Unexpected libgfapi behaviour

Mon Oct 30 13:14:38 UTC 2017

On Mon, Oct 30, 2017 at 11:37:24AM +0100, Denis Chaplygin wrote:
> Hello!
> 
> I observe a weird behavior of glfs_init call. I'm using it in the following
> way:
> 
>         fs = self._glfs_new(path)
>         ret = self._glfs_set_volfile_server(fs, "tcp",
>                                             address,
> self.GLUSTER_DEFAULT_PORT)
>         if ret == -1:
>             err = ctypes.get_errno()
>             raise IOError(
>                 "glfs_set_volfile_server failed: %s" % os.strerror(err))
> 
>         ret = self._glfs_init(fs)
>         if ret == -1:
>             err = ctypes.get_errno()
>             raise IOError(
>                 "glfs_init failed: %s" % os.strerror(err))
>         self.fs = fs
> 
> 
> The code above is in Python and libgfapi is used via cpython using
> following definitions:
> 
>    # C function prototypes for using the library gfapi
>     _lib = ctypes.CDLL("libgfapi.so.0", use_errno=True)
> 
>     _glfs_new = ctypes.CFUNCTYPE(
>         ctypes.c_void_p, ctypes.c_char_p)(('glfs_new', _lib))
> 
>     _glfs_set_volfile_server = ctypes.CFUNCTYPE(
>         ctypes.c_int,
>         ctypes.c_void_p,
>         ctypes.c_char_p,
>         ctypes.c_char_p,
>         ctypes.c_int)(('glfs_set_volfile_server', _lib))
> 
>     _glfs_init = ctypes.CFUNCTYPE(
>         ctypes.c_int, ctypes.c_void_p)(('glfs_init', _lib))
> 
> Address is 'brq-gluster01.rhev.lab.eng.brq.redhat.com' and path is
> '/testiso'. Volume is up and running:
> 
> Status of volume: testiso
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick brq-gluster01.rhev.lab.eng.brq.redhat
> .com:/data/testiso/brick1/brick             49152     0          Y
> 13641
> Brick brq-gluster02.rhev.lab.eng.brq.redhat
> .com:/data/testiso/brick1/brick             49152     0          Y
> 13519
> Brick brq-gluster03.rhev.lab.eng.brq.redhat
> .com:/data/testiso/brick1/brick             49152     0          Y
> 23145
> Self-heal Daemon on localhost               N/A       N/A        Y
> 24975
> Self-heal Daemon on brq-gluster02.rhev.lab.
> eng.brq.redhat.com                          N/A       N/A        Y
> 23572
> Self-heal Daemon on brq-gluster03.rhev.lab.
> eng.brq.redhat.com                          N/A       N/A        Y
> 24603
> 
> Task Status of Volume testiso
> ------------------------------------------------------------------------------
> There are no active volume tasks
> 
> 
> The problem is that glfs_init call fail with return value of '-1' and errno
> set to 0. There is nothing in logs on the gluster side. I also tried
> running it with strace, nothing suspicious - connection to glusterd is
> established successfully, there is some data transfer followed by load of
> several translators.
> 
> The questions is  - what can be wrong and how to debug it?
> 
> Gluster version is 3.12.1-2.el7 on both client and server side.

The best logging you can get is when you use the glfs_set_logging()
call:
  https://github.com/gluster/glusterfs/blob/release-3.12/api/src/glfs.h#L199

Make sure that the process can create the logfile. By default the
log-level is set to INFO(7), setting it to DEBUG(8) or TRACE(9) might be
helpful.

The connection to glusterd is only for the management part. Once
glfs_init() is called, the volume layout is fetched from glusterd, and
based on those details, the connections to the bricks are made. You can
check the logs from the bricks to see if connections go established.
Capturing a tcpdump on the system where the gfapi application is running
and loading it in wireshark can show more whats happening on the network
side (wireshark knows most of the Gluster protocol).

Good luck!
Niels