[Bugs] [Bug 1784375] New: 'gluster volume set <VOLUME> disable.nfs' accidentally killed unexpected process, and forced a data brick offline.

bugzilla at redhat.com bugzilla at redhat.com
Tue Dec 17 10:47:26 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1784375

            Bug ID: 1784375
           Summary: 'gluster volume set <VOLUME> disable.nfs' accidentally
                    killed unexpected process, and forced a data brick
                    offline.
           Product: GlusterFS
           Version: mainline
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: glusterd
          Keywords: ZStream
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: srakonde at redhat.com
                CC: amukherj at redhat.com, bmekala at redhat.com,
                    bugs at gluster.org, kkagoshi at redhat.com,
                    rhs-bugs at redhat.com, storage-qa-internal at redhat.com,
                    vbellur at redhat.com
        Depends On: 1784211
  Target Milestone: ---
    Classification: Community



Description of problem:
////////////////////////////////////////////////////////////////////////////

- 'gluster volume set vol01 disable.nfs' accidentally killed unexpected
process, and forced a data brick offline.
- Command was executed when no nfs daemon spawned by the glusterd was running.
- Killed pid 22821was reused from the nfs daemon previously running.

- 4 nodes configuration
- Command forced a problem ran on node kiotloglgst04

[root at kiotloglgst04 hpework]# gluster volume set vol01 nfs.disable off
volume set: success

[ log:
/cases/02502431/40-sosreport-kiotloglgst04-2019-10-28-iolxwep.tar.xz/sosreport-kiotloglgst04-2019-10-28-iolxwep/var/log/glusterfs/cmd_history.log-20191025
]
[2019-10-24 05:31:04.248179]  : volume set vol01 nfs.disable off : SUCCESS 

- Command accidentally killed a process pid:22821 which is not an nfs process
spawned by glusterd but another unknown process.

[ log:
/cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/var/log/glusterfs/glusterd.log-20191025
]
[2019-10-24 05:31:02.211907] I [MSGID: 106568]
[glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping nfs daemon
running in pid: 22821    <----- killed
[2019-10-24 05:31:03.212479] I [MSGID: 106568]
[glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is
stopped
..... snip .....
[2019-10-24 05:31:03.217608] I [MSGID: 106567]
[glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting nfs service
[2019-10-24 05:31:04.241212] I [MSGID: 106132]
[glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already stopped

- The data brick kiotloglgst03:/bricks/brick01/1 forced offline as a result of
command.

[ log:
/cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/var/log/glusterfs/bricks/bricks-brick01-1.log-20191025
]
[2019-10-24 05:25:42.670252] I [MSGID: 101055] [client_t.c:443:gf_client_unref]
0-vol01-server: Shutting down connection
kiotloglgst03-14626-2019/07/05-02:27:23:117770-vol01-client-4-0-0
[2019-10-24 05:29:06.609832] I [addr.c:55:compare_addr_and_update]
0-/bricks/brick01/1: allowed = "*", received addr = "100.65.182.132"
[2019-10-24 05:29:06.609889] I [login.c:111:gf_auth] 0-auth/login: allowed user
names: 42126d88-8cf4-4c37-83a9-5d0bf62e1d2e
[2019-10-24 05:29:06.609926] I [MSGID: 115029]
[server-handshake.c:564:server_setvolume] 0-vol01-server: accepted client from
kiotloglgst04-8482-2019/10/24-05:29:06:571393-vol01-client-4-0-0 (version:
3.12.2) with subvol /bricks/brick01/1
[2019-10-24 05:29:06.675247] I [MSGID: 115036] [server.c:571:server_rpc_notify]
0-vol01-server: disconnecting connection from
kiotloglgst04-8482-2019/10/24-05:29:06:571393-vol01-client-4-0-0
[2019-10-24 05:29:06.675559] I [MSGID: 101055] [client_t.c:443:gf_client_unref]
0-vol01-server: Shutting down connection
kiotloglgst04-8482-2019/10/24-05:29:06:571393-vol01-client-4-0-0
[2019-10-24 05:31:02.195748] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec] 0-mgmt:
Volume file changed
..... Customer restarted brick .....
[2019-10-24 07:50:25.949355] I [MSGID: 100030] [glusterfsd.c:2504:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.12.2
(args: /usr/sbin/glusterfsd -s kiotloglgst03 --volfile-id
vol01.kiotloglgst03.bricks-brick01-1 -p
/var/run/gluster/vols/vol01/kiotloglgst03-bricks-brick01-1.pid -S
/var/run/gluster/f0a387bbc58a292e.socket --brick-name /bricks/brick01/1 -l
/var/log/glusterfs/bricks/bricks-brick01-1.log --xlator-option
*-posix.glusterd-uuid=0621266b-41ea-401e-bfcb-839d2b16ca82 --brick-port 49153
--xlator-option vol01-server.listen-port=49153)
[2019-10-24 07:50:25.955180] W [MSGID: 101002] [options.c:995:xl_opt_validate]
0-glusterfs: option 'address-family' is deprecated, preferred is
'transport.address-family', continuing with correction
[2019-10-24 07:50:25.958438] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1

- Where pid 22821 from?

[log:
/cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/var/log/glusterfs/glusterd.log-20190717]
..... snip .....
[2019-07-16 07:03:46.632909] I [MSGID: 106568]
[glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping nfs daemon
running in pid: 22821    <----- pid
[2019-07-16 07:03:47.633263] I [MSGID: 106568]
[glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is
stopped

[ log:
/cases/02502431/40-sosreport-kiotloglgst04-2019-10-28-iolxwep.tar.xz/sosreport-kiotloglgst04-2019-10-28-iolxwep/var/log/glusterfs/cmd_history.log-20190717.gz]
.... snip .....
[2019-07-16 06:49:38.747879]  : volume set vol01 nfs.disable off : SUCCESS
[2019-07-16 07:03:47.652222]  : volume set vol01 nfs.disable on : SUCCESS

- Node kiotloglgst03 booted on Fri Jul  5, apparently pid 22821 is reused by
another process.
[log:
/cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/last]
..... snip .....
reboot   system boot  3.10.0-957.12.2. Fri Jul  5 11:27 - 16:48 (111+05:21)

- Customer confirmed that there was no nfs daemon running before command. They
have output from those commands as below.

[ log:
/cases/02502431/40-sosreport-kiotloglgst04-2019-10-28-iolxwep.tar.xz/sosreport-kiotloglgst04-2019-10-28-iolxwep/var/log/glusterfs/cmd_history.log-20191025
]

[2019-10-24 05:28:55.097546]  : volume status vol01 : SUCCESS
[2019-10-24 05:30:45.862941]  : volume status vol01 : SUCCESS                  
        <----- (*1)
[2019-10-24 05:31:04.248179]  : volume set vol01 nfs.disable off : SUCCESS     
 <----- (*2)
[2019-10-24 05:31:08.796906]  : volume status vol01 : SUCCESS                  
        <----- (*3)

  (*1)
[root at kiotloglgst04 hpework]# gluster volume status vol01
Status of volume: vol01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick kiotloglgst01:/bricks/brick01/1       49152     0          Y       30193
Brick kiotloglgst02:/bricks/brick01/1       49152     0          Y       31254
Brick kiotloglgst03:/bricks/brick03/1       49152     0          Y       14926
Brick kiotloglgst02:/bricks/brick02/1       49153     0          Y       31263
Brick kiotloglgst03:/bricks/brick01/1       49153     0          Y       14913 
    <----- Brick will be killed 
Brick kiotloglgst04:/bricks/brick03/1       49152     0          Y       14941
Brick kiotloglgst03:/bricks/brick02/1       49154     0          Y       14912
Brick kiotloglgst04:/bricks/brick01/1       49153     0          Y       14935
Brick kiotloglgst01:/bricks/brick03/1       49153     0          Y       30213
Brick kiotloglgst04:/bricks/brick02/1       49154     0          Y       14942
Brick kiotloglgst01:/bricks/brick02/1       49154     0          Y       30233
Brick kiotloglgst02:/bricks/brick03/1       49154     0          Y       31269
Self-heal Daemon on localhost               N/A       N/A        Y       14586
Quota Daemon on localhost                   N/A       N/A        Y       14886
Self-heal Daemon on kiotloglgst02           N/A       N/A        Y       31234
Quota Daemon on kiotloglgst02               N/A       N/A        Y       31244
Self-heal Daemon on kiotloglgst01           N/A       N/A        Y       30255
Quota Daemon on kiotloglgst01               N/A       N/A        Y       30265
Self-heal Daemon on kiotloglgst03           N/A       N/A        Y       14611
Quota Daemon on kiotloglgst03               N/A       N/A        Y       14874

Task Status of Volume vol01
------------------------------------------------------------------------------
There are no active volume tasks

(*2)
[root at kiotloglgst04 hpework]# gluster volume set vol01 nfs.disable off
volume set: success

(*3)
[root at kiotloglgst04 hpework]# gluster volume set vol01 nfs.disable offtatus
vol01
Status of volume: vol01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick kiotloglgst01:/bricks/brick01/1       49152     0          Y       30193
Brick kiotloglgst02:/bricks/brick01/1       49152     0          Y       31254
Brick kiotloglgst03:/bricks/brick03/1       49152     0          Y       14926
Brick kiotloglgst02:/bricks/brick02/1       49153     0          Y       31263
Brick kiotloglgst03:/bricks/brick01/1       N/A       N/A        N       N/A   
    <----- Brick killed
Brick kiotloglgst04:/bricks/brick03/1       49152     0          Y       14941
Brick kiotloglgst03:/bricks/brick02/1       49154     0          Y       14912
Brick kiotloglgst04:/bricks/brick01/1       49153     0          Y       14935
Brick kiotloglgst01:/bricks/brick03/1       49153     0          Y       30213
Brick kiotloglgst04:/bricks/brick02/1       49154     0          Y       14942
Brick kiotloglgst01:/bricks/brick02/1       49154     0          Y       30233
Brick kiotloglgst02:/bricks/brick03/1       49154     0          Y       31269
NFS Server on localhost                     2049      0          Y       8716 
Self-heal Daemon on localhost               N/A       N/A        Y       14586
Quota Daemon on localhost                   N/A       N/A        Y       14886
NFS Server on kiotloglgst01                 2049      0          Y       348  
Self-heal Daemon on kiotloglgst01           N/A       N/A        Y       30255
Quota Daemon on kiotloglgst01               N/A       N/A        Y       30265
NFS Server on kiotloglgst03                 2049      0          Y       17671
Self-heal Daemon on kiotloglgst03           N/A       N/A        Y       14611
Quota Daemon on kiotloglgst03               N/A       N/A        Y       14874
NFS Server on kiotloglgst02                 2049      0          Y       19331
Self-heal Daemon on kiotloglgst02           N/A       N/A        Y       31234
Quota Daemon on kiotloglgst02               N/A       N/A        Y       31244

Task Status of Volume vol01
------------------------------------------------------------------------------
There are no active volume tasks

==============================================================================
gluster volume info
==============================================================================
Volume Name: vol01
Type: Distributed-Replicate
Volume ID: be62535e-9fbd-4bd8-b1cd-90eca0b25ed0
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: kiotloglgst01:/bricks/brick01/1
Brick2: kiotloglgst02:/bricks/brick01/1
Brick3: kiotloglgst03:/bricks/brick03/1 (arbiter)
Brick4: kiotloglgst02:/bricks/brick02/1
Brick5: kiotloglgst03:/bricks/brick01/1
Brick6: kiotloglgst04:/bricks/brick03/1 (arbiter)
Brick7: kiotloglgst03:/bricks/brick02/1
Brick8: kiotloglgst04:/bricks/brick01/1
Brick9: kiotloglgst01:/bricks/brick03/1 (arbiter)
Brick10: kiotloglgst04:/bricks/brick02/1
Brick11: kiotloglgst01:/bricks/brick02/1
Brick12: kiotloglgst02:/bricks/brick03/1 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
features.shard: off
features.shard-block-size: 16MB
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
features.default-soft-limit: 95
network.ping-timeout: 10

Version-Release number of selected component (if applicable):
////////////////////////////////////////////////////////////////////////////

==============================================================================
grep "gluster" installed-rpms
==============================================================================
glusterfs-3.12.2-25.el7rhgs.x86_64                          Tue Jun 11 10:58:20
2019
glusterfs-api-3.12.2-25.el7rhgs.x86_64                      Tue Jun 11 10:58:21
2019
glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64           Tue Jun 11 10:57:35
2019
glusterfs-cli-3.12.2-25.el7rhgs.x86_64                      Tue Jun 11 10:57:35
2019
glusterfs-events-3.12.2-25.el7rhgs.x86_64                   Tue Jun 11 10:58:54
2019
glusterfs-fuse-3.12.2-25.el7rhgs.x86_64                     Tue Jun 11 10:58:23
2019
glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64          Tue Jun 11 10:58:54
2019
glusterfs-libs-3.12.2-25.el7rhgs.x86_64                     Tue Jun 11 10:57:31
2019
glusterfs-rdma-3.12.2-25.el7rhgs.x86_64                     Tue Jun 11 10:58:36
2019
glusterfs-server-3.12.2-25.el7rhgs.x86_64                   Tue Jun 11 10:58:53
2019
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64               Tue Jun 11 10:58:43
2019
gluster-nagios-common-0.2.4-1.el7rhgs.noarch                Tue Jun 11 10:57:46
2019
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.10.x86_64 Tue Jun 11
16:04:57 2019
python2-gluster-3.12.2-25.el7rhgs.x86_64                    Tue Jun 11 10:57:36
2019
tendrl-gluster-integration-1.6.3-13.el7rhgs.noarch          Tue Jun 11 16:04:59
2019
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch                     Tue Jun 11 10:59:13
2019

==============================================================================
cat /etc/redhat-storage-release
==============================================================================
Red Hat Gluster Storage Server 3.4

How reproducible:

- Always reproduced if nfs deamon's pid is reused by another process.


Problem easily reproduced:
////////////////////////////////////////////////////////////////////////////

- Create 2 +1 arbiter vol
- Make sure, no nfs daemon running

[root at rhel-a glusterfs]# gluster vol info avol

Volume Name: avol
Type: Replicate
Volume ID: 1e1e1870-b21d-4b0c-9cd0-b95d67725e1b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: rhel-a:/vol1/brick/b1
Brick2: rhel-b:/vol1/brick/b1
Brick3: rhel-c:/vol1/brick/b1 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on                             <----- No nfs 
transport.address-family: inet

///// No nfs daemon shown in status below.
[root at rhel-a glusterfs]# gluster vol status avol
Status of volume: avol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhel-a:/vol1/brick/b1                 49152     0          Y       4386
Brick rhel-b:/vol1/brick/b1                 49152     0          Y       4109
Brick rhel-c:/vol1/brick/b1                 49152     0          Y       4140
Self-heal Daemon on localhost               N/A       N/A        Y       4285
Self-heal Daemon on rhel-c                  N/A       N/A        Y       4081
Self-heal Daemon on rhel-b                  N/A       N/A        Y       4058

Task Status of Volume avol
------------------------------------------------------------------------------
There are no active volume tasks

- Start dummy process

[root at rhel-a nfs]# strace sleep 100000; date

- Find sleep pid and overwrite /var/run/gluster/nfs/nfs.pid file with sleep's
pid.

[root at rhel-a glusterfs]# ps aux | grep sleep
root      8545  0.0  0.0 107952   360 ?        S    13:08   0:00 sleep 60
root      8586  0.0  0.0   7188   816 pts/1    S+   13:09   0:00 strace sleep
100000
root      8588  0.0  0.0 107952   584 pts/1    S+   13:09   0:00 sleep 100000  
       <------ Dummy pid 8588
root      8590  0.0  0.0 112680   696 pts/0    R+   13:09   0:00 grep
--color=auto sleep

[root at rhel-a glusterfs]# ls -l /var/run/gluster/nfs/nfs.pid
-rw-r--r--. 1 root root 5 Dec 12 13:07 /var/run/gluster/nfs/nfs.pid

[root at rhel-a glusterfs]# echo 8588 > /var/run/gluster/nfs/nfs.pid

- Ensure no nfs daemon running

[root at rhel-a glusterfs]# ps aux | grep glusterfs
root      4285  0.0  0.3 677720  7296 ?        Ssl  10:55   0:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/827bf70aee5e557b.socket --log-level=INFO --xlator-option
*replicate*.node-uuid=080da7ab-1eec-4d0c-8551-b4e7226e9f55
root      4386  0.0  1.0 929228 19580 ?        Ssl  10:55   0:00
/usr/sbin/glusterfsd -s rhel-a --volfile-id avol.rhel-a.vol1-brick-b1 -p
/var/run/gluster/vols/avol/rhel-a-vol1-brick-b1.pid -S
/var/run/gluster/a12b523ebfcadaec.socket --brick-name /vol1/brick/b1 -l
/var/log/glusterfs/bricks/vol1-brick-b1.log --xlator-option
*-posix.glusterd-uuid=080da7ab-1eec-4d0c-8551-b4e7226e9f55 --brick-port 49152
--xlator-option avol-server.listen-port=49152
root      8619  0.0  0.0 112684   956 pts/0    S+   13:10   0:00 grep
--color=auto glusterfs

- Set 'nfs.disble off'

[root at rhel-a glusterfs]# gluster vol set avol nfs.disable off; date
volume set: success
Thu Dec 12 13:10:43 JST 2019

[root at rhel-a glusterfs]# gluster vol status avol
Status of volume: avol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhel-a:/vol1/brick/b1                 49152     0          Y       4386
Brick rhel-b:/vol1/brick/b1                 49152     0          Y       4109
Brick rhel-c:/vol1/brick/b1                 49152     0          Y       4140
NFS Server on localhost                     2049      0          Y       8644  
     <----- nfs daemon started
Self-heal Daemon on localhost               N/A       N/A        Y       4285
NFS Server on rhel-c                        2049      0          Y       8235
Self-heal Daemon on rhel-c                  N/A       N/A        Y       4081
NFS Server on rhel-b                        2049      0          Y       8181
Self-heal Daemon on rhel-b                  N/A       N/A        Y       4058

Task Status of Volume avol
------------------------------------------------------------------------------
There are no active volume tasks

- Dummy sleep killed
.... snip .....
fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7ff9e8162000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2502
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0x7ff9e8162000, 4096)            = 0
open("/usr/lib/locale/En/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT
(No such file or directory)
nanosleep({100000, 0},  <unfinished ...>
+++ killed by SIGKILL +++
強制終了
Thu Dec 12 13:10:39 JST 2019
[root at rhel-a nfs]#

- /var/log/glusterfs/glusterd.log shows sleep was killed.

[2019-12-12 04:10:39.809093] I [MSGID: 106568]
[glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping nfs daemon
running in pid: 8588  <----- Dummy sleep killed
[2019-12-12 04:10:40.810417] I [MSGID: 106568]
[glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is
stopped
[2019-12-12 04:10:40.811184] I [MSGID: 106540]
[glusterd-utils.c:5115:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered
MOUNTV3 successfully
[2019-12-12 04:10:40.811327] I [MSGID: 106540]
[glusterd-utils.c:5124:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered
MOUNTV1 successfully
[2019-12-12 04:10:40.811441] I [MSGID: 106540]
[glusterd-utils.c:5133:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered
NFSV3 successfully
[2019-12-12 04:10:40.811588] I [MSGID: 106540]
[glusterd-utils.c:5142:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered
NLM v4 successfully
[2019-12-12 04:10:40.811701] I [MSGID: 106540]
[glusterd-utils.c:5151:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered
NLM v1 successfully
[2019-12-12 04:10:40.811810] I [MSGID: 106540]
[glusterd-utils.c:5160:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered
ACL v3 successfully
[2019-12-12 04:10:40.813519] I [MSGID: 106567]
[glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting nfs service


Code path for starting nfs daemon
////////////////////////////////////////////////////////////////////////////

glusterd_nfssvc_manager() {
    ....
   ret = svc->stop (svc, SIGKILL);     <*** calls glusterd_nfssvc_stop()
    ....
    ret = svc->start (svc, flags);
    ....
}

Whenever nfs daemon is started, glusterd kills existing nfs daemon by calling
glusterd_nfssvc_stop().
In glusterd_nfssvc_stop() code path,

glusterd_nfssvc_stop() {
    glusterd_proc_is_running() {
    ....    
    glusterd_svc_stop (svc, sig) {
        glusterd_proc_stop() {
            gf_is_service_running() {
                fopen (pidfile, "r+");                             <**1**
"/var/run/gluster/nfs/nfs.pid" exists?
                fscanf (file, "%d", pid);                         <**2** Read
PID in
                gf_is_pid_running(*pid) {
                    snprintf(fname, sizeof(fname), "/proc/%d/cmdline", pid);
                    if (sys_access (fname , R_OK) != 0) {                      
                <**3**  "/proc/PID/cmdline" accessible?
                        ....
                    }
                     ....
                }
                ....
            }
            /*** If conditions above met, kills pid below. Conditions are
enclosed with **X**

            ret = kill (pid, sig);
        ....
    ....
}
///// Apparently we have a bug. Running pid check should be done more accurate
or remove "/var/run/gluster/nfs/nfs.pid" file when nfs daemon gets killed by
"disable.nfs on".

--- Additional comment from Sanju on 2019-12-17 09:05:21 IST ---

Thanks for the detailed bug report. I will read the code and send out a fix if
there is a bug.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1784211
[Bug 1784211] [GSS] - 'gluster volume set <VOLUME> disable.nfs' accidentally
killed unexpected process, and forced a data brick offline.
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list