[Gluster-users] [Bugs] Bricks are going offline unable to recover with heal/start force commands

Thu Jan 24 10:29:02 UTC 2019

  Hi Sanju,

Please find requested information (these are latest logs :) ).

I can see only following error messages related to brick 
"brick_e15c12cceae12c8ab7782dd57cf5b6c1" (on secondnode log)

[2019-01-23 11:50:20.322902] I 
[glusterd-utils.c:5994:glusterd_brick_start] 0-management: discovered 
already-running brick 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
[2019-01-23 11:50:20.322925] I [MSGID: 106142] 
[glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/
brick on port 49165  >> showing running on port but not
[2019-01-23 11:50:20.327557] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already 
stopped
[2019-01-23 11:50:20.327586] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is 
stopped
[2019-01-23 11:50:20.327604] I [MSGID: 106599] 
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: 
nfs/server.so xlator is not installed
[2019-01-23 11:50:20.337735] I [MSGID: 106568] 
[glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping 
glustershd daemon running in pid: 69525
[2019-01-23 11:50:21.338058] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: glustershd 
service is stopped
[2019-01-23 11:50:21.338180] I [MSGID: 106567] 
[glusterd-svc-mgmt.c:203:glusterd_svc_start] 0-management: Starting 
glustershd service
[2019-01-23 11:50:21.348234] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already 
stopped
[2019-01-23 11:50:21.348285] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is 
stopped
[2019-01-23 11:50:21.348866] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already 
stopped
[2019-01-23 11:50:21.348883] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is 
stopped
[2019-01-23 11:50:22.356502] I [run.c:241:runner_log] 
(-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) 
[0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) 
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Ran script: 
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd
[2019-01-23 11:50:22.368845] E [run.c:241:runner_log] 
(-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) 
[0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) 
[0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Failed to execute script: 
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd

sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
Gluster process                             TCP Port  RDMA Port  Online 
Pid
------------------------------------------------------------------------------
Brick 192.168.3.6:/var/lib/heketi/mounts/vg
_ca57f326195c243be2380ce4e42a4191/brick_952
d75fd193c7209c9a81acbc23a3747/brick         49157     0          Y 250
Brick 192.168.3.5:/var/lib/heketi/mounts/vg
_d5f17487744584e3652d3ca943b0b91b/brick_e15
c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N N/A
Brick 192.168.3.15:/var/lib/heketi/mounts/v
g_462ea199185376b03e4b0317363bb88c/brick_17
36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y 225
Self-heal Daemon on localhost               N/A       N/A        Y 109550
Self-heal Daemon on 192.168.3.6             N/A       N/A        Y 52557
Self-heal Daemon on 192.168.3.15            N/A       N/A        Y 16946

Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
------------------------------------------------------------------------------
There are no active volume tasks

BR
Salam

From:   "Sanju Rakonde" <srakonde at redhat.com>
To:     "Shaik Salam" <shaik.salam at tcs.com>
Cc:     "Amar Tumballi Suryanarayan" <atumball at redhat.com>, 
"gluster-users at gluster.org List" <gluster-users at gluster.org>, "Murali 
Kottakota" <murali.kottakota at tcs.com>
Date:   01/24/2019 02:32 PM
Subject:        Re: [Gluster-users] [Bugs] Bricks are going offline unable 
to recover with heal/start force commands

"External email. Open with Caution"
Shaik,

Sorry to ask this again. What errors are you seeing in glusterd logs? Can 
you share the latest logs?

On Thu, Jan 24, 2019 at 2:05 PM Shaik Salam <shaik.salam at tcs.com> wrote:
Hi Sanju, 

Please find requsted information. 

Are you still seeing the error "Unable to read pidfile:" in glusterd log? 
 >>>>  No 
Are you seeing "brick is deemed not to be a part of the volume" error in 
glusterd log?>>>> No 

sh-4.2# getfattr -m -d -e hex 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick 

sh-4.2# getfattr -m -d -e hex 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae1^C8ab7782dd57cf5b6c1/brick 

sh-4.2# pwd 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick 

sh-4.2# getfattr -m -d -e hex 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick 

sh-4.2# getfattr -m -d -e hex 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ 

sh-4.2# getfattr -m -d -e hex 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ 

sh-4.2# getfattr -m -d -e hex 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ 

sh-4.2# getfattr -d -m . -e hex 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ 

getfattr: Removing leading '/' from absolute path names 
# file: 
var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ 

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 

trusted.afr.dirty=0x000000000000000000000000 
trusted.afr.vol_3442e86b6d994a14de73f1b8c82cf0b8-client-0=0x000000000000000000000000 

trusted.gfid=0x00000000000000000000000000000001 
trusted.glusterfs.dht=0x000000010000000000000000ffffffff 
trusted.glusterfs.volume-id=0x15477f3622e84757a0ce9000b63fa849 

sh-4.2# ls -la |wc -l 
86 
sh-4.2# pwd 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick 

sh-4.2# 

From:        "Sanju Rakonde" <srakonde at redhat.com> 
To:        "Shaik Salam" <shaik.salam at tcs.com> 
Cc:        "Amar Tumballi Suryanarayan" <atumball at redhat.com>, "
gluster-users at gluster.org List" <gluster-users at gluster.org>, "Murali 
Kottakota" <murali.kottakota at tcs.com> 
Date:        01/24/2019 01:38 PM 
Subject:        Re: [Gluster-users] [Bugs] Bricks are going offline unable 
to recover with heal/start force commands 

"External email. Open with Caution" 
Shaik, 

Previously I was suspecting, whether brick pid file is missing. But I see 
it is present. 

>From second node (this brick is in offline state): 
/var/run/gluster/vols/vol_3442e86b6d994a14de73f1b8c82cf0b8/192.168.3.5-var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.pid 

271 
 Are you still seeing the error "Unable to read pidfile:" in glusterd log? 

I also suspect whether brick is missing its extended attributes. Are you 
seeing "brick is deemed not to be a part of the volume" error in glusterd 
log? If not can you please provide us output of  "getfattr -m -d -e hex 
<brickpath>" 

On Thu, Jan 24, 2019 at 12:18 PM Shaik Salam <shaik.salam at tcs.com> wrote: 
Hi Sanju, 

Could you please have look my issue if you have time (atleast provide 
workaround). 

BR 
Salam 

From:        Shaik Salam/HYD/TCS 
To:        "Sanju Rakonde" <srakonde at redhat.com> 
Cc:        "Amar Tumballi Suryanarayan" <atumball at redhat.com>, "
gluster-users at gluster.org List" <gluster-users at gluster.org>, "Murali 
Kottakota" <murali.kottakota at tcs.com> 
Date:        01/23/2019 05:50 PM 
Subject:        Re: [Gluster-users] [Bugs] Bricks are going offline unable 
to recover with heal/start force commands 

Hi Sanju, 

Please find requested information. 

Sorry to repeat again I am trying start force command once brick log 
enabled to debug by taking one volume example. 
Please correct me If I am doing wrong. 

[root at master ~]# oc rsh glusterfs-storage-vll7x 
sh-4.2# gluster volume info vol_3442e86b6d994a14de73f1b8c82cf0b8 

Volume Name: vol_3442e86b6d994a14de73f1b8c82cf0b8 
Type: Replicate 
Volume ID: 15477f36-22e8-4757-a0ce-9000b63fa849 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 1 x 3 = 3 
Transport-type: tcp 
Bricks: 
Brick1: 
192.168.3.6:/var/lib/heketi/mounts/vg_ca57f326195c243be2380ce4e42a4191/brick_952d75fd193c7209c9a81acbc23a3747/brick 

Brick2: 
192.168.3.5:/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/
brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick 
Brick3: 
192.168.3.15:/var/lib/heketi/mounts/vg_462ea199185376b03e4b0317363bb88c/brick_1736459d19e8aaa1dcb5a87f48747d04/brick 

Options Reconfigured: 
diagnostics.brick-log-level: INFO 
performance.client-io-threads: off 
nfs.disable: on 
transport.address-family: inet 
sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8 
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8 
Gluster process                             TCP Port  RDMA Port  Online 
 Pid 
------------------------------------------------------------------------------ 

Brick 192.168.3.6:/var/lib/heketi/mounts/vg 
_ca57f326195c243be2380ce4e42a4191/brick_952 
d75fd193c7209c9a81acbc23a3747/brick         49157     0          Y       
250 
Brick 192.168.3.5:/var/lib/heketi/mounts/vg 
_d5f17487744584e3652d3ca943b0b91b/brick_e15 
c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N       
N/A 
Brick 192.168.3.15:/var/lib/heketi/mounts/v 
g_462ea199185376b03e4b0317363bb88c/brick_17 
36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y       
225 
Self-heal Daemon on localhost               N/A       N/A        Y       
108434 
Self-heal Daemon on matrix1.matrix.orange.l 
ab                                          N/A       N/A        Y       
69525 
Self-heal Daemon on matrix2.matrix.orange.l 
ab                                          N/A       N/A        Y       
18569 

gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 
diagnostics.brick-log-level DEBUG 
volume set: success 
sh-4.2# gluster volume get vol_3442e86b6d994a14de73f1b8c82cf0b8 all |grep 
log 
cluster.entry-change-log                on 
cluster.data-change-log                 on 
cluster.metadata-change-log             on 
diagnostics.brick-log-level             DEBUG 

sh-4.2# cd /var/log/glusterfs/bricks/ 
sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1 
-rw-------. 1 root root       0 Jan 20 02:46                         
 var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log 
 >>> Noting in log 

-rw-------. 1 root root  189057 Jan 18 09:20 
var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log-20190120 

[2019-01-23 11:49:32.475956] I [run.c:241:runner_log] 
(-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) 
[0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) 
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Ran script: 
/var/lib/glusterd/hooks/1/set/post/S30samba-set.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o 
diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd 
[2019-01-23 11:49:32.483191] I [run.c:241:runner_log] 
(-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) 
[0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) 
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Ran script: 
/var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o 
diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd 
[2019-01-23 11:48:59.111292] W [MSGID: 106036] 
[glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management: 
Snapshot list failed 
[2019-01-23 11:50:14.112271] E [MSGID: 106026] 
[glusterd-snapshot.c:3962:glusterd_handle_snapshot_list] 0-management: 
Volume (vol_63854b105c40802bdec77290e91858ea) does not exist [Invalid 
argument] 
[2019-01-23 11:50:14.112305] W [MSGID: 106036] 
[glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management: 
Snapshot list failed 
[2019-01-23 11:50:20.322902] I 
[glusterd-utils.c:5994:glusterd_brick_start] 0-management: discovered 
already-running brick 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick 

[2019-01-23 11:50:20.322925] I [MSGID: 106142] 
[glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick 
/var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick 
on port 49165 
[2019-01-23 11:50:20.327557] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already 
stopped 
[2019-01-23 11:50:20.327586] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is 
stopped 
[2019-01-23 11:50:20.327604] I [MSGID: 106599] 
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: 
nfs/server.so xlator is not installed 
[2019-01-23 11:50:20.337735] I [MSGID: 106568] 
[glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping 
glustershd daemon running in pid: 69525 
[2019-01-23 11:50:21.338058] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: glustershd 
service is stopped 
[2019-01-23 11:50:21.338180] I [MSGID: 106567] 
[glusterd-svc-mgmt.c:203:glusterd_svc_start] 0-management: Starting 
glustershd service 
[2019-01-23 11:50:21.348234] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already 
stopped 
[2019-01-23 11:50:21.348285] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is 
stopped 
[2019-01-23 11:50:21.348866] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already 
stopped 
[2019-01-23 11:50:21.348883] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is 
stopped 
[2019-01-23 11:50:22.356502] I [run.c:241:runner_log] 
(-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) 
[0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) 
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Ran script: 
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd 
[2019-01-23 11:50:22.368845] E [run.c:241:runner_log] 
(-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) 
[0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) 
[0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Failed to execute script: 
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd 

sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8 
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8 
Gluster process                             TCP Port  RDMA Port  Online 
 Pid 
------------------------------------------------------------------------------ 

Brick 192.168.3.6:/var/lib/heketi/mounts/vg 
_ca57f326195c243be2380ce4e42a4191/brick_952 
d75fd193c7209c9a81acbc23a3747/brick         49157     0          Y       
250 
Brick 192.168.3.5:/var/lib/heketi/mounts/vg 
_d5f17487744584e3652d3ca943b0b91b/brick_e15 
c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N       
N/A 
Brick 192.168.3.15:/var/lib/heketi/mounts/v 
g_462ea199185376b03e4b0317363bb88c/brick_17 
36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y       
225 
Self-heal Daemon on localhost               N/A       N/A        Y       
109550 
Self-heal Daemon on 192.168.3.6             N/A       N/A        Y       
52557 
Self-heal Daemon on 192.168.3.15            N/A       N/A        Y       
16946 

Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8 
------------------------------------------------------------------------------ 

There are no active volume tasks 

From:        "Sanju Rakonde" <srakonde at redhat.com> 
To:        "Shaik Salam" <shaik.salam at tcs.com> 
Cc:        "Amar Tumballi Suryanarayan" <atumball at redhat.com>, "
gluster-users at gluster.org List" <gluster-users at gluster.org>, "Murali 
Kottakota" <murali.kottakota at tcs.com> 
Date:        01/23/2019 02:15 PM 
Subject:        Re: [Gluster-users] [Bugs] Bricks are going offline unable 
to recover with heal/start force commands 

"External email. Open with Caution" 
Hi Shaik, 

I can see below errors in glusterd logs. 

[2019-01-22 09:20:17.540196] E [MSGID: 101012] 
[common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: 
/var/run/gluster/vols/vol_e1aa1283d5917485d88c4a742eeff422/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_9e7c382e5f853d471c347bc5590359af-brick.pid 

[2019-01-22 09:20:17.546408] E [MSGID: 101012] 
[common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: 
/var/run/gluster/vols/vol_f0ed498d7e781d7bb896244175b31f9e/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_47ed9e0663ad0f6f676ddd6ad7e3dcde-brick.pid 

[2019-01-22 09:20:17.552575] E [MSGID: 101012] 
[common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: 
/var/run/gluster/vols/vol_f387519c9b004ec14e80696db88ef0f8/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_06ad6c73dfbf6a5fc21334f98c9973c2-brick.pid 

[2019-01-22 09:20:17.558888] E [MSGID: 101012] 
[common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: 
/var/run/gluster/vols/vol_f8ca343c60e6efe541fe02d16ca02a7d/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_525225f65753b05dfe33aeaeb9c5de39-brick.pid 

[2019-01-22 09:20:17.565266] E [MSGID: 101012] 
[common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: 
/var/run/gluster/vols/vol_fe882e074c0512fd9271fc2ff5a0bfe1/192.168.3.6-var-lib-heketi-mounts-vg_28708570b029e5eff0a996c453a11691-brick_d4f30d6e465a8544b759a7016fb5aab5-brick.pid 

[2019-01-22 09:20:17.585926] E [MSGID: 106028] 
[glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get 
pid of brick process 
[2019-01-22 09:20:17.617806] E [MSGID: 106028] 
[glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get 
pid of brick process 
[2019-01-22 09:20:17.649628] E [MSGID: 101012] 
[common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: 
/var/run/gluster/glustershd/glustershd.pid 
[2019-01-22 09:20:17.649700] E [MSGID: 101012] 
[common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: 
/var/run/gluster/glustershd/glustershd.pid 

So it looks like, neither gf_is_service_running() 
nor glusterd_brick_signal() are able to read the pid file. That means 
pidfiles might be having nothing to read. 

Can you please paste the contents of brick pidfiles. You can find brick 
pidfiles in /var/run/gluster/vols/<volname>/ or you can just run this 
command "for i in `ls /var/run/gluster/vols/*/*.pid`;do echo $i;cat 
$i;done" 

On Wed, Jan 23, 2019 at 12:49 PM Shaik Salam <shaik.salam at tcs.com> wrote: 
Hi Sanju, 

Please find requested information attached logs. 

Below brick is offline and try to start force/heal commands but doesn't 
makes up. 

sh-4.2# 
sh-4.2# gluster --version 
glusterfs 4.1.5 

sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8 
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8 
Gluster process                             TCP Port  RDMA Port  Online 
 Pid 
------------------------------------------------------------------------------ 

Brick 192.168.3.6:/var/lib/heketi/mounts/vg 
_ca57f326195c243be2380ce4e42a4191/brick_952 
d75fd193c7209c9a81acbc23a3747/brick         49166     0          Y       
269 
Brick 192.168.3.5:/var/lib/heketi/mounts/vg 
_d5f17487744584e3652d3ca943b0b91b/brick_e15 
c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N       
N/A 
Brick 192.168.3.15:/var/lib/heketi/mounts/v 
g_462ea199185376b03e4b0317363bb88c/brick_17 
36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y       
225 
Self-heal Daemon on localhost               N/A       N/A        Y       
45826 
Self-heal Daemon on 192.168.3.6             N/A       N/A        Y       
65196 
Self-heal Daemon on 192.168.3.15            N/A       N/A        Y       
52915 

Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8 
------------------------------------------------------------------------------ 

We can see following events from when we start forcing volumes 

/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) 
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Ran script: 
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd 
[2019-01-21 08:22:34.555068] E [run.c:241:runner_log] 
(-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) 
[0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) 
[0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Failed to execute script: 
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd 
[2019-01-21 08:22:53.389049] I [MSGID: 106499] 
[glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8 

[2019-01-21 08:23:25.346839] I [MSGID: 106487] 
[glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd: 
Received cli list req 

We can see following events from when we heal volumes. 

[2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0 
[2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk] 
0-cli: Received resp to heal volume 
[2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1 

[2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running 
gluster with version 4.1.5 
[2019-01-21 08:22:30.463648] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1 
[2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now 
[2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0 
[2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now 
[2019-01-21 08:22:34.581555] I 
[cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start 
volume 
[2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0 
[2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running 
gluster with version 4.1.5 
[2019-01-21 08:22:53.387992] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1 
[2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now 
[2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0 
[2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0 
[2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running 
gluster with version 4.1.5 
[2019-01-21 08:23:25.346319] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1 
[2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now 
[2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0 

Enabled DEBUG mode for brick level. But nothing writing to brick log. 

gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 
diagnostics.brick-log-level DEBUG 

sh-4.2# pwd 
/var/log/glusterfs/bricks 

sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1 
-rw-------. 1 root root       0 Jan 20 02:46 
var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log 

From:        Sanju Rakonde <srakonde at redhat.com> 
To:        Shaik Salam <shaik.salam at tcs.com> 
Cc:        Amar Tumballi Suryanarayan <atumball at redhat.com>, "
gluster-users at gluster.org List" <gluster-users at gluster.org> 
Date:        01/22/2019 02:21 PM 
Subject:        Re: [Gluster-users] [Bugs] Bricks are going offline unable 
to recover with heal/start force commands 

"External email. Open with Caution" 
Hi Shaik, 

Can you please provide us complete glusterd and cmd_history logs from all 
the nodes in the cluster? Also please paste output of the following 
commands (from all nodes): 
1. gluster --version 
2. gluster volume info 
3. gluster volume status 
4. gluster peer status 
5. ps -ax | grep glusterfsd 

On Tue, Jan 22, 2019 at 12:47 PM Shaik Salam <shaik.salam at tcs.com> wrote: 
Hi Surya, 

It is already customer setup and cant redeploy again. 
Enabled debug for brick level log but nothing writing to it. 
Can you tell me is any other ways to troubleshoot  or logs to look?? 

From:        Shaik Salam/HYD/TCS 
To:        "Amar Tumballi Suryanarayan" <atumball at redhat.com> 
Cc:        "gluster-users at gluster.org List" <gluster-users at gluster.org> 
Date:        01/22/2019 12:06 PM 
Subject:        Re: [Bugs] Bricks are going offline unable to recover with 
heal/start force commands 

Hi Surya, 

I have enabled DEBUG mode for brick level. But nothing writing to brick 
log. 

gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 
diagnostics.brick-log-level DEBUG 

sh-4.2# pwd 
/var/log/glusterfs/bricks 

sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1 
-rw-------. 1 root root       0 Jan 20 02:46 
var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log 

BR 
Salam 

From:        "Amar Tumballi Suryanarayan" <atumball at redhat.com> 
To:        "Shaik Salam" <shaik.salam at tcs.com> 
Cc:        "gluster-users at gluster.org List" <gluster-users at gluster.org> 
Date:        01/22/2019 11:38 AM 
Subject:        Re: [Bugs] Bricks are going offline unable to recover with 
heal/start force commands 

"External email. Open with Caution" 
Hi Shaik, 

Can you check what is there in brick logs? They are located in 
/var/log/glusterfs/bricks/*?  

Looks like the samba hooks script failed, but that shouldn't matter in 
this use case. 

Also, I see that you are trying to setup heketi to provision volumes, 
which means you may be using gluster in container usecases. If you are 
still in 'PoC' phase, can you give https://github.com/gluster/gcs a try? 
That makes the deployment and the stack little simpler. 

-Amar 

On Tue, Jan 22, 2019 at 11:29 AM Shaik Salam <shaik.salam at tcs.com> wrote: 
Can anyone respond how to recover bricks apart from heal/start force 
according to below events from logs. 
Please let me know any other logs required. 
Thanks in advance. 

BR 
Salam 

From:        Shaik Salam/HYD/TCS 
To:        bugs at gluster.org, gluster-users at gluster.org 
Date:        01/21/2019 10:03 PM 
Subject:        Bricks are going offline unable to recover with heal/start 
force commands 

Hi, 

Bricks are in offline and  unable to recover with following commands 

gluster volume heal <vol-name> 

gluster volume start <vol-name> force 

But still bricks are offline. 

sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8 
Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8 
Gluster process                             TCP Port  RDMA Port  Online 
 Pid 
------------------------------------------------------------------------------ 

Brick 192.168.3.6:/var/lib/heketi/mounts/vg 
_ca57f326195c243be2380ce4e42a4191/brick_952 
d75fd193c7209c9a81acbc23a3747/brick         49166     0          Y       
269 
Brick 192.168.3.5:/var/lib/heketi/mounts/vg 
_d5f17487744584e3652d3ca943b0b91b/brick_e15 
c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N       
N/A 
Brick 192.168.3.15:/var/lib/heketi/mounts/v 
g_462ea199185376b03e4b0317363bb88c/brick_17 
36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y       
225 
Self-heal Daemon on localhost               N/A       N/A        Y       
45826 
Self-heal Daemon on 192.168.3.6             N/A       N/A        Y       
65196 
Self-heal Daemon on 192.168.3.15            N/A       N/A        Y       
52915 

Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8 
------------------------------------------------------------------------------ 

We can see following events from when we start forcing volumes 

/mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) 
[0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Ran script: 
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd 
[2019-01-21 08:22:34.555068] E [run.c:241:runner_log] 
(-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) 
[0x7fca9e139b3a] 
-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) 
[0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
[0x7fcaa346f0e5] ) 0-management: Failed to execute script: 
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh 
--volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 
--volume-op=start --gd-workdir=/var/lib/glusterd 
[2019-01-21 08:22:53.389049] I [MSGID: 106499] 
[glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8 

[2019-01-21 08:23:25.346839] I [MSGID: 106487] 
[glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd: 
Received cli list req 

We can see following events from when we heal volumes. 

[2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0 
[2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk] 
0-cli: Received resp to heal volume 
[2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1 

[2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running 
gluster with version 4.1.5 
[2019-01-21 08:22:30.463648] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1 
[2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now 
[2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0 
[2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now 
[2019-01-21 08:22:34.581555] I 
[cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start 
volume 
[2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0 
[2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running 
gluster with version 4.1.5 
[2019-01-21 08:22:53.387992] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1 
[2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now 
[2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0 
[2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0 
[2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running 
gluster with version 4.1.5 
[2019-01-21 08:23:25.346319] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1 
[2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now 
[2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-glusterfs: error returned while attempting to connect to host:(null), 
port:0 

Please let us know steps to recover bricks. 

BR 
Salam 
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you 
_______________________________________________
Bugs mailing list
Bugs at gluster.org
https://lists.gluster.org/mailman/listinfo/bugs 

-- 
Amar Tumballi (amarts) 
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users 

-- 
Thanks, 
Sanju 

-- 
Thanks, 
Sanju 

-- 
Thanks, 
Sanju 

-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190124/ab45072e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: firstnode.log
Type: application/octet-stream
Size: 294510 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190124/ab45072e/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: secondnode.log
Type: application/octet-stream
Size: 1260140 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190124/ab45072e/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Thirdnode.log
Type: application/octet-stream
Size: 295999 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190124/ab45072e/attachment-0005.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: volume-level-info.log.txt
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190124/ab45072e/attachment-0001.txt>