[Bugs] [Bug 1509340] New: glusterd does not write pidfile correctly when forking

Fri Nov 3 15:07:27 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1509340

            Bug ID: 1509340
           Summary: glusterd does not write pidfile correctly when forking
           Product: GlusterFS
           Version: mainline
         Component: glusterd
          Assignee: bugs at gluster.org
          Reporter: nh2-redhatbugzilla at deditus.de
                CC: bugs at gluster.org

Description of problem:

glusterd writes its pidfile from the forked child process. It should write it
from the parent process, before that one exits.

This can lead to random glusterd startup failures when using systemd.

Services that fork are usually expected to write their pidfile from the
*parent* process, not the child process. This is so that whoever starts the
parent process can determine whether the startup was successful or not (they
inspect the pidfile after the the parent exits; if it's empty, starting failed,
if it's nonempty, it succeeded).

This is also systemd's expectation. The docs in "Example 4. Traditional forking
services" say:

  https://www.freedesktop.org/software/systemd/man/systemd.service.html

  "Note that the daemon should write that file before finishing with its
initialization. Otherwise, systemd might try to read the file before it
exists."

This is exactly what happens to glusterd, which is probably why sometimes
glusterd mysteriously fails to start on distributions using systemd (and the
systemd unit provided in the glusterfs source code, which uses `PIDFile`). When
that happens, you simply see in `journalctl -o short-precise`:

    Nov 03 14:14:47.947284 node1 systemd[1]: Starting GlusterFS, a clustered
file-system server...
    Nov 03 14:14:48.053037 node1 systemd[1]: glusterd.service: Control process
exited, code=exited status=1
    Nov 03 14:14:48.055438 node1 systemd[1]: Failed to start GlusterFS, a
clustered file-system server.

Here, glusterd seems to exit with code 1 without good reason.
In `/var/log/glusterfs/glusterd.log` we have the following output:

    [2017-11-03 14:14:48.023049] I [MSGID: 100030] [glusterfsd.c:2496:main]
0-/nix/store/527752abk5cmf2yrpri9bgyb46g6i1as-glusterfs-3.12.1/sbin/glusterd:
Started running
/nix/store/527752abk5cmf2yrpri9bgyb46g6i1as-glusterfs-3.12.1/sbin/glusterd
version 3.12.1 (args:
/nix/store/527752abk5cmf2yrpri9bgyb46g6i1as-glusterfs-3.12.1/sbin/glusterd -p
/run/glusterd.pid --log-level=DEBUG)
    [2017-11-03 14:14:48.025480] D [MSGID: 0]
[glusterfsd.c:2107:glusterfs_pidfile_update] 0-glusterfsd: pidfile
/run/glusterd.pid updated with pid 1041
    [2017-11-03 14:14:48.025523] D [logging.c:1817:__gf_log_inject_timer_event]
0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
    [2017-11-03 14:14:48.027786] D [MSGID: 0] [glusterfsd.c:660:get_volfp]
0-glusterfsd: loading volume file
/nix/store/527752abk5cmf2yrpri9bgyb46g6i1as-glusterfs-3.12.1/etc/glusterfs/glusterd.vol
    [2017-11-03 14:14:48.046019] I [MSGID: 106478] [glusterd.c:1422:init]
0-management: Maximum allowed open file descriptors set to 65536
    [2017-11-03 14:14:48.046052] I [MSGID: 106479] [glusterd.c:1480:init]
0-management: Using /var/lib/glusterd as working directory
    [2017-11-03 14:14:48.046059] I [MSGID: 106479] [glusterd.c:1485:init]
0-management: Using /var/run/gluster as pid file working directory
    [2017-11-03 14:14:48.046157] D [MSGID: 0]
[glusterd.c:475:glusterd_rpcsvc_options_build] 0-glusterd: listen-backlog
value: 10
    [2017-11-03 14:14:48.047404] D [rpcsvc.c:2424:rpcsvc_init] 0-rpc-service:
RPC service inited.
    [2017-11-03 14:14:48.047415] D [rpcsvc.c:1974:rpcsvc_program_register]
0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0
    [2017-11-03 14:14:48.047432] D [rpc-transport.c:279:rpc_transport_load]
0-rpc-transport: attempt to load file
/nix/store/527752abk5cmf2yrpri9bgyb46g6i1as-glusterfs-3.12.1/lib/glusterfs/3.12.1/rpc-transport/socket.so
    [2017-11-03 14:14:48.051255] D [socket.c:4202:socket_init]
0-socket.management: Configued transport.tcp-user-timeout=0
    [2017-11-03 14:14:48.051273] D [socket.c:4220:socket_init]
0-socket.management: Reconfigued transport.keepalivecnt=9
    [2017-11-03 14:14:48.051281] I [socket.c:4305:socket_init]
0-socket.management: SSL support on the I/O path is ENABLED
    [2017-11-03 14:14:48.051285] I [socket.c:4308:socket_init]
0-socket.management: SSL support for glusterd is ENABLED
    [2017-11-03 14:14:48.051289] I [socket.c:4325:socket_init]
0-socket.management: using private polling thread
    [2017-11-03 14:14:48.052501] E [socket.c:4452:socket_init]
0-socket.management: could not load private key
    [2017-11-03 14:14:48.052513] D [socket.c:4569:init] 0-socket.management:
socket_init() failed
    [2017-11-03 14:14:48.052518] W [rpc-transport.c:350:rpc_transport_load]
0-rpc-transport: 'socket' initialization failed
    [2017-11-03 14:14:48.052620] W [rpcsvc.c:1682:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
    [2017-11-03 14:14:48.052629] E [MSGID: 106243] [glusterd.c:1761:init]
0-management: creation of listener failed
    [2017-11-03 14:14:48.052638] E [MSGID: 101019] [xlator.c:503:xlator_init]
0-management: Initialization of volume 'management' failed, review your volfile
again
    [2017-11-03 14:14:48.052665] E [MSGID: 101066]
[graph.c:327:glusterfs_graph_init] 0-management: initializing translator failed
    [2017-11-03 14:14:48.052672] E [MSGID: 101176]
[graph.c:698:glusterfs_graph_activate] 0-graph: init failed
    [2017-11-03 14:14:48.053702] D [logging.c:1791:gf_log_flush_extra_msgs]
0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages
    [2017-11-03 14:14:48.053720] D [logging.c:1794:gf_log_flush_extra_msgs]
0-logging-infra: Just flushed 5 extra log messages
    [2017-11-03 14:14:48.054826] W [glusterfsd.c:1347:cleanup_and_exit]
(-->/nix/store/qka18inm6zxz0yyxhq17dq7k6bck9anv-glibc-2.25-49/lib/libpthread.so.0(+0x7234)
[0x7fdb853d8234]
-->/nix/store/527752abk5cmf2yrpri9bgyb46g6i1as-glusterfs-3.12.1/sbin/glusterd(glusterfs_sigwaiter+0xe5)
[0x408f75]
-->/nix/store/527752abk5cmf2yrpri9bgyb46g6i1as-glusterfs-3.12.1/sbin/glusterd(cleanup_and_exit+0x57)
[0x408d97] ) 0-: received signum (15), shutting down
    [2017-11-03 14:14:48.054843] D
[glusterfsd-mgmt.c:2690:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper
signout arguments not given

Here everything begins to start as normal, but then we suddenly get `received
signum (15), shutting down`.

Who has send signal 15 (SIGTERM) to glusterd? I suspect the following race
condition happens:

* glusterd is started
* glusterd fork()s
* systemd sees the started process exiting
* systemd checks if the PIDFile contains a pid; it doesn't
* systemd sends SIGTERM to the entire remaining glusterd.service cgroup
* [never happens:] the fork()ed child would write the pidfile now

I have confirmed the above with

  strace -f -ttt glusterd -p /run/glusterd.pid --log-level=DEBUG

where I can see:

  1509720256.948021 open("/run/glusterd.pid", O_RDWR|O_CREAT|O_APPEND, 0666) =
6
  ...
  1509720256.949193 clone(child_stack=NULL,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7fe34ddcb9d0) = 1324
  ...
  strace: Process 1324 attached
  ...
  [pid  1324] 1509720256.950559 write(6, "1324\n", 5) = 5

This shows that the pidfile contents are written by the child.

Version-Release number of selected component (if applicable): 3.12

How reproducible: Always

Steps to Reproduce:
1. Run strace as I described above

Actual results:

The child process writes to the pidfile.

Expected results:

The parent process writes to the pidfile.

Additional info:

I found this issue when maintaining glusterfs on NixOS.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.