[Gluster-devel] glustershd status

Thu Jul 17 04:54:49 UTC 2014

So here is what i found long email please bare with me

Looks like the management daemon and these other daemons

eg: brick, nfs-server, gluster self-heal daemon

They work in non-blocking manner, as in notifying back to Gluster
management daemon when they are available and when they are not. This
is done through a notify() callback mechanism

A registered notify() handler is supposed to call setter() functions
which update the state of the notified instance with in gluster
management daemon

Taking self-heal daemon as an example:

"conf->shd->online" ---> is the primary value which should be set
during this notify call back where self-heal-daemon informs of its
availability to Gluster management daemon - this happens during a
RPCCLNT_CONNECT event()

During this event glusterd_nodesvc_set_online_status() sets all the
necessary state online/offline.

What happens in FreeBSD/NetBSD is that this notify event doesn't occur
at all for some odd reason - there in-fact a first notify() event but
that in-fact sets the value as "offline" i.e status == 0 (gf_boolean_t
== _gf_false)

In-fact this is true on Linux as well - there is smaller time window
observe the below output , immediately run 'volume status' after a
'volume start' event

# gluster volume status
Status of volume: repl
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 127.0.1.1:/d/backends/brick1                      49152   Y       29082
Brick 127.0.1.1:/d/backends/brick2                      49153   Y       29093
NFS Server on localhost                                 N/A     N       N/A
Self-heal Daemon on localhost                           N/A     N       N/A

Task Status of Volume repl
------------------------------------------------------------------------------
There are no active volume tasks

Both these commands are 1 sec apart

# gluster volume status
Status of volume: repl
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 127.0.1.1:/d/backends/brick1                      49152   Y       29082
Brick 127.0.1.1:/d/backends/brick2                      49153   Y       29093
NFS Server on localhost                                 2049    Y       29115
Self-heal Daemon on localhost                           N/A     Y       29110

Task Status of Volume repl
------------------------------------------------------------------------------
There are no active volume tasks

So the change happens but sadly this doesn't happen on non-Linux
platform, my general speculation is that this is related to
poll()/epoll() -  i have to debug this further.

In-fact restarting 'gluster management daemon' fixes these issues
which is understandable :-)

On Wed, Jul 16, 2014 at 9:41 AM, Emmanuel Dreyfus <manu at netbsd.org> wrote:
> Harshavardhana <harsha at harshavardhana.net> wrote:
>
>> Its pretty much the same on FreeBSD, i didn't spend much time debugging
>> it. Let me do it right away and let you know what i find.
>
> Right. Once you will have this one, I have Linux-specific truncate and
> md5csum replacements to contribute. I do not send them now since I
> cannot test them.
>
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> manu at netbsd.org

-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes