[Gluster-users] [Bugs] Bricks are going offline unable to recover with heal/start force commands
Sanju Rakonde
srakonde at redhat.com
Thu Jan 24 10:41:56 UTC 2019
Mohit,
Have we came across this kind of issue? This user using gluster 4.1
version. Did we fix any related bug afterwards?
Looks like setup has some issues but I'm not sure.
On Thu, Jan 24, 2019 at 4:01 PM Shaik Salam <shaik.salam at tcs.com> wrote:
>
>
> Hi Sanju,
>
> Please find requested information (these are latest logs :) ).
>
> I can see only following error messages related to brick
> "brick_e15c12cceae12c8ab7782dd57cf5b6c1" (on secondnode log)
>
> [2019-01-23 11:50:20.322902] I
> [glusterd-utils.c:5994:glusterd_brick_start] 0-management: discovered
> already-running brick
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> [2019-01-23 11:50:20.322925] I [MSGID: 106142]
> [glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> on port 49165 >> showing running on port but not
> [2019-01-23 11:50:20.327557] I [MSGID: 106131]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already
> stopped
> [2019-01-23 11:50:20.327586] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is
> stopped
> [2019-01-23 11:50:20.327604] I [MSGID: 106599]
> [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so
> xlator is not installed
> [2019-01-23 11:50:20.337735] I [MSGID: 106568]
> [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping
> glustershd daemon running in pid: 69525
> [2019-01-23 11:50:21.338058] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: glustershd
> service is stopped
> [2019-01-23 11:50:21.338180] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:203:glusterd_svc_start] 0-management: Starting
> glustershd service
> [2019-01-23 11:50:21.348234] I [MSGID: 106131]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already
> stopped
> [2019-01-23 11:50:21.348285] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is
> stopped
> [2019-01-23 11:50:21.348866] I [MSGID: 106131]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already
> stopped
> [2019-01-23 11:50:21.348883] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is
> stopped
> [2019-01-23 11:50:22.356502] I [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-23 11:50:22.368845] E [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
> [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Failed to execute script:
> /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
>
>
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process TCP Port RDMA Port Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick 49157 0 Y
> 250
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y
> 225
> Self-heal Daemon on localhost N/A N/A Y
> 109550
> Self-heal Daemon on 192.168.3.6 N/A N/A Y
> 52557
> Self-heal Daemon on 192.168.3.15 N/A N/A Y
> 16946
>
> Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
>
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> BR
> Salam
>
>
>
> From: "Sanju Rakonde" <srakonde at redhat.com>
> To: "Shaik Salam" <shaik.salam at tcs.com>
> Cc: "Amar Tumballi Suryanarayan" <atumball at redhat.com>, "
> gluster-users at gluster.org List" <gluster-users at gluster.org>, "Murali
> Kottakota" <murali.kottakota at tcs.com>
> Date: 01/24/2019 02:32 PM
> Subject: Re: [Gluster-users] [Bugs] Bricks are going offline
> unable to recover with heal/start force commands
> ------------------------------
>
>
>
> *"External email. Open with Caution"*
> Shaik,
>
> Sorry to ask this again. What errors are you seeing in glusterd logs? Can
> you share the latest logs?
>
> On Thu, Jan 24, 2019 at 2:05 PM Shaik Salam <*shaik.salam at tcs.com*
> <shaik.salam at tcs.com>> wrote:
> Hi Sanju,
>
> Please find requsted information.
>
> Are you still seeing the error "Unable to read pidfile:" in glusterd log?
> >>>> No
> Are you seeing "brick is deemed not to be a part of the volume" error in
> glusterd log?>>>> No
>
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae1^C8ab7782dd57cf5b6c1/brick
> sh-4.2# pwd
>
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
> sh-4.2# getfattr -d -m . -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
> getfattr: Removing leading '/' from absolute path names
> # file:
> var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
>
> trusted.afr.vol_3442e86b6d994a14de73f1b8c82cf0b8-client-0=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x15477f3622e84757a0ce9000b63fa849
>
> sh-4.2# ls -la |wc -l
> 86
> sh-4.2# pwd
>
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> sh-4.2#
>
>
>
> From: "Sanju Rakonde" <*srakonde at redhat.com* <srakonde at redhat.com>>
> To: "Shaik Salam" <*shaik.salam at tcs.com* <shaik.salam at tcs.com>>
> Cc: "Amar Tumballi Suryanarayan" <*atumball at redhat.com*
> <atumball at redhat.com>>, "*gluster-users at gluster.org*
> <gluster-users at gluster.org> List" <*gluster-users at gluster.org*
> <gluster-users at gluster.org>>, "Murali Kottakota" <
> *murali.kottakota at tcs.com* <murali.kottakota at tcs.com>>
> Date: 01/24/2019 01:38 PM
> Subject: Re: [Gluster-users] [Bugs] Bricks are going offline
> unable to recover with heal/start force commands
> ------------------------------
>
>
>
> * "External email. Open with Caution"*
> Shaik,
>
> Previously I was suspecting, whether brick pid file is missing. But I see
> it is present.
>
> From second node (this brick is in offline state):
> /var/run/gluster/vols/vol_3442e86b6d994a14de73f1b8c82cf0b8/192.168.3.5-var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.pid
>
> 271
> Are you still seeing the error "Unable to read pidfile:" in glusterd log?
>
> I also suspect whether brick is missing its extended attributes. Are you
> seeing "brick is deemed not to be a part of the volume" error in glusterd
> log? If not can you please provide us output of "getfattr -m -d -e hex
> <brickpath>"
>
> On Thu, Jan 24, 2019 at 12:18 PM Shaik Salam <*shaik.salam at tcs.com*
> <shaik.salam at tcs.com>> wrote:
> Hi Sanju,
>
> Could you please have look my issue if you have time (atleast provide
> workaround).
>
> BR
> Salam
>
>
>
> From: Shaik Salam/HYD/TCS
> To: "Sanju Rakonde" <*srakonde at redhat.com* <srakonde at redhat.com>>
> Cc: "Amar Tumballi Suryanarayan" <*atumball at redhat.com*
> <atumball at redhat.com>>, "*gluster-users at gluster.org*
> <gluster-users at gluster.org> List" <*gluster-users at gluster.org*
> <gluster-users at gluster.org>>, "Murali Kottakota" <
> *murali.kottakota at tcs.com* <murali.kottakota at tcs.com>>
> Date: 01/23/2019 05:50 PM
> Subject: Re: [Gluster-users] [Bugs] Bricks are going offline
> unable to recover with heal/start force commands
> ------------------------------
>
>
>
>
> Hi Sanju,
>
> Please find requested information.
>
> Sorry to repeat again I am trying start force command once brick log
> enabled to debug by taking one volume example.
> Please correct me If I am doing wrong.
>
>
> [root at master ~]# oc rsh glusterfs-storage-vll7x
> sh-4.2# gluster volume info vol_3442e86b6d994a14de73f1b8c82cf0b8
>
> Volume Name: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Type: Replicate
> Volume ID: 15477f36-22e8-4757-a0ce-9000b63fa849
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.3.6:
> /var/lib/heketi/mounts/vg_ca57f326195c243be2380ce4e42a4191/brick_952d75fd193c7209c9a81acbc23a3747/brick
> Brick2: 192.168.3.5:
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/
> brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> Brick3: 192.168.3.15:
> /var/lib/heketi/mounts/vg_462ea199185376b03e4b0317363bb88c/brick_1736459d19e8aaa1dcb5a87f48747d04/brick
> Options Reconfigured:
> diagnostics.brick-log-level: INFO
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process TCP Port RDMA Port Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick 49157 0 Y
> 250
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y
> 225
> Self-heal Daemon on localhost N/A N/A Y
> 108434
> Self-heal Daemon on matrix1.matrix.orange.l
> ab N/A N/A Y
> 69525
> Self-heal Daemon on matrix2.matrix.orange.l
> ab N/A N/A Y
> 18569
>
> gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8
> diagnostics.brick-log-level DEBUG
> volume set: success
> sh-4.2# gluster volume get vol_3442e86b6d994a14de73f1b8c82cf0b8 all |grep
> log
> cluster.entry-change-log on
> cluster.data-change-log on
> cluster.metadata-change-log on
> diagnostics.brick-log-level DEBUG
>
> sh-4.2# cd /var/log/glusterfs/bricks/
> sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1
> -rw-------. 1 root root 0 Jan 20 02:46
> var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log
> >>> Noting in log
>
> -rw-------. 1 root root 189057 Jan 18 09:20
> var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log-20190120
>
> [2019-01-23 11:49:32.475956] I [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o
> diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd
> [2019-01-23 11:49:32.483191] I [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o
> diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd
> [2019-01-23 11:48:59.111292] W [MSGID: 106036]
> [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management:
> Snapshot list failed
> [2019-01-23 11:50:14.112271] E [MSGID: 106026]
> [glusterd-snapshot.c:3962:glusterd_handle_snapshot_list] 0-management:
> Volume (vol_63854b105c40802bdec77290e91858ea) does not exist [Invalid
> argument]
> [2019-01-23 11:50:14.112305] W [MSGID: 106036]
> [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management:
> Snapshot list failed
> [2019-01-23 11:50:20.322902] I
> [glusterd-utils.c:5994:glusterd_brick_start] 0-management: discovered
> already-running brick
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> [2019-01-23 11:50:20.322925] I [MSGID: 106142]
> [glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> on port 49165
> [2019-01-23 11:50:20.327557] I [MSGID: 106131]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already
> stopped
> [2019-01-23 11:50:20.327586] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is
> stopped
> [2019-01-23 11:50:20.327604] I [MSGID: 106599]
> [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so
> xlator is not installed
> [2019-01-23 11:50:20.337735] I [MSGID: 106568]
> [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping
> glustershd daemon running in pid: 69525
> [2019-01-23 11:50:21.338058] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: glustershd
> service is stopped
> [2019-01-23 11:50:21.338180] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:203:glusterd_svc_start] 0-management: Starting
> glustershd service
> [2019-01-23 11:50:21.348234] I [MSGID: 106131]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already
> stopped
> [2019-01-23 11:50:21.348285] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is
> stopped
> [2019-01-23 11:50:21.348866] I [MSGID: 106131]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already
> stopped
> [2019-01-23 11:50:21.348883] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is
> stopped
> [2019-01-23 11:50:22.356502] I [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-23 11:50:22.368845] E [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
> [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Failed to execute script:
> /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
>
>
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process TCP Port RDMA Port Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick 49157 0 Y
> 250
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y
> 225
> Self-heal Daemon on localhost N/A N/A Y
> 109550
> Self-heal Daemon on 192.168.3.6 N/A N/A Y
> 52557
> Self-heal Daemon on 192.168.3.15 N/A N/A Y
> 16946
>
> Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
>
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
>
>
> From: "Sanju Rakonde" <*srakonde at redhat.com* <srakonde at redhat.com>>
> To: "Shaik Salam" <*shaik.salam at tcs.com* <shaik.salam at tcs.com>>
> Cc: "Amar Tumballi Suryanarayan" <*atumball at redhat.com*
> <atumball at redhat.com>>, "*gluster-users at gluster.org*
> <gluster-users at gluster.org> List" <*gluster-users at gluster.org*
> <gluster-users at gluster.org>>, "Murali Kottakota" <
> *murali.kottakota at tcs.com* <murali.kottakota at tcs.com>>
> Date: 01/23/2019 02:15 PM
> Subject: Re: [Gluster-users] [Bugs] Bricks are going offline
> unable to recover with heal/start force commands
> ------------------------------
>
>
>
> * "External email. Open with Caution"*
> Hi Shaik,
>
> I can see below errors in glusterd logs.
>
> [2019-01-22 09:20:17.540196] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_e1aa1283d5917485d88c4a742eeff422/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_9e7c382e5f853d471c347bc5590359af-brick.pid
>
> [2019-01-22 09:20:17.546408] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_f0ed498d7e781d7bb896244175b31f9e/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_47ed9e0663ad0f6f676ddd6ad7e3dcde-brick.pid
>
> [2019-01-22 09:20:17.552575] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_f387519c9b004ec14e80696db88ef0f8/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_06ad6c73dfbf6a5fc21334f98c9973c2-brick.pid
>
> [2019-01-22 09:20:17.558888] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_f8ca343c60e6efe541fe02d16ca02a7d/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_525225f65753b05dfe33aeaeb9c5de39-brick.pid
>
> [2019-01-22 09:20:17.565266] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_fe882e074c0512fd9271fc2ff5a0bfe1/192.168.3.6-var-lib-heketi-mounts-vg_28708570b029e5eff0a996c453a11691-brick_d4f30d6e465a8544b759a7016fb5aab5-brick.pid
>
> [2019-01-22 09:20:17.585926] E [MSGID: 106028]
> [glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get pid
> of brick process
> [2019-01-22 09:20:17.617806] E [MSGID: 106028]
> [glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get pid
> of brick process
> [2019-01-22 09:20:17.649628] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/glustershd/glustershd.pid
> [2019-01-22 09:20:17.649700] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/glustershd/glustershd.pid
>
> So it looks like, neither gf_is_service_running()
> nor glusterd_brick_signal() are able to read the pid file. That means
> pidfiles might be having nothing to read.
>
> Can you please paste the contents of brick pidfiles. You can find brick
> pidfiles in /var/run/gluster/vols/<volname>/ or you can just run this
> command "for i in `ls /var/run/gluster/vols/*/*.pid`;do echo $i;cat
> $i;done"
>
> On Wed, Jan 23, 2019 at 12:49 PM Shaik Salam <*shaik.salam at tcs.com*
> <shaik.salam at tcs.com>> wrote:
> Hi Sanju,
>
> Please find requested information attached logs.
>
>
>
>
> Below brick is offline and try to start force/heal commands but doesn't
> makes up.
>
> sh-4.2#
> sh-4.2# gluster --version
> glusterfs 4.1.5
>
>
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process TCP Port RDMA Port Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick 49166 0 Y
> 269
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y
> 225
> Self-heal Daemon on localhost N/A N/A Y
> 45826
> Self-heal Daemon on 192.168.3.6 N/A N/A Y
> 65196
> Self-heal Daemon on 192.168.3.15 N/A N/A Y
> 52915
>
> Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
>
> ------------------------------------------------------------------------------
>
>
> We can see following events from when we start forcing volumes
>
> /mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:34.555068] E [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
> [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Failed to execute script:
> /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:53.389049] I [MSGID: 106499]
> [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8
> [2019-01-21 08:23:25.346839] I [MSGID: 106487]
> [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd:
> Received cli list req
>
>
> We can see following events from when we heal volumes.
>
> [2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk]
> 0-cli: Received resp to heal volume
> [2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1
> [2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:30.463648] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:34.581555] I
> [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start
> volume
> [2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:53.387992] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:23:25.346319] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
>
>
> Enabled DEBUG mode for brick level. But nothing writing to brick log.
>
> gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8
> diagnostics.brick-log-level DEBUG
>
> sh-4.2# pwd
> /var/log/glusterfs/bricks
>
> sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1
> -rw-------. 1 root root 0 Jan 20 02:46
> var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log
>
>
>
>
>
>
> From: Sanju Rakonde <*srakonde at redhat.com* <srakonde at redhat.com>>
> To: Shaik Salam <*shaik.salam at tcs.com* <shaik.salam at tcs.com>>
> Cc: Amar Tumballi Suryanarayan <*atumball at redhat.com*
> <atumball at redhat.com>>, "*gluster-users at gluster.org*
> <gluster-users at gluster.org> List" <*gluster-users at gluster.org*
> <gluster-users at gluster.org>>
> Date: 01/22/2019 02:21 PM
> Subject: Re: [Gluster-users] [Bugs] Bricks are going offline
> unable to recover with heal/start force commands
> ------------------------------
>
>
>
> * "External email. Open with Caution"*
> Hi Shaik,
>
> Can you please provide us complete glusterd and cmd_history logs from all
> the nodes in the cluster? Also please paste output of the following
> commands (from all nodes):
> 1. gluster --version
> 2. gluster volume info
> 3. gluster volume status
> 4. gluster peer status
> 5. ps -ax | grep glusterfsd
>
> On Tue, Jan 22, 2019 at 12:47 PM Shaik Salam <*shaik.salam at tcs.com*
> <shaik.salam at tcs.com>> wrote:
> Hi Surya,
>
> It is already customer setup and cant redeploy again.
> Enabled debug for brick level log but nothing writing to it.
> Can you tell me is any other ways to troubleshoot or logs to look??
>
>
> From: Shaik Salam/HYD/TCS
> To: "Amar Tumballi Suryanarayan" <*atumball at redhat.com*
> <atumball at redhat.com>>
> Cc: "*gluster-users at gluster.org* <gluster-users at gluster.org> List"
> <*gluster-users at gluster.org* <gluster-users at gluster.org>>
> Date: 01/22/2019 12:06 PM
> Subject: Re: [Bugs] Bricks are going offline unable to recover
> with heal/start force commands
> ------------------------------
>
>
> Hi Surya,
>
> I have enabled DEBUG mode for brick level. But nothing writing to brick
> log.
>
> gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8
> diagnostics.brick-log-level DEBUG
>
> sh-4.2# pwd
> /var/log/glusterfs/bricks
>
> sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1
> -rw-------. 1 root root 0 Jan 20 02:46
> var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log
>
> BR
> Salam
>
>
>
>
> From: "Amar Tumballi Suryanarayan" <*atumball at redhat.com*
> <atumball at redhat.com>>
> To: "Shaik Salam" <*shaik.salam at tcs.com* <shaik.salam at tcs.com>>
> Cc: "*gluster-users at gluster.org* <gluster-users at gluster.org> List"
> <*gluster-users at gluster.org* <gluster-users at gluster.org>>
> Date: 01/22/2019 11:38 AM
> Subject: Re: [Bugs] Bricks are going offline unable to recover
> with heal/start force commands
> ------------------------------
>
>
>
> * "External email. Open with Caution"*
> Hi Shaik,
>
> Can you check what is there in brick logs? They are located in
> /var/log/glusterfs/bricks/*?
>
> Looks like the samba hooks script failed, but that shouldn't matter in
> this use case.
>
> Also, I see that you are trying to setup heketi to provision volumes,
> which means you may be using gluster in container usecases. If you are
> still in 'PoC' phase, can you give *https://github.com/gluster/gcs*
> <https://github.com/gluster/gcs> a try? That makes the deployment and the
> stack little simpler.
>
> -Amar
>
>
>
>
> On Tue, Jan 22, 2019 at 11:29 AM Shaik Salam <*shaik.salam at tcs.com*
> <shaik.salam at tcs.com>> wrote:
> Can anyone respond how to recover bricks apart from heal/start force
> according to below events from logs.
> Please let me know any other logs required.
> Thanks in advance.
>
> BR
> Salam
>
>
>
> From: Shaik Salam/HYD/TCS
> To: *bugs at gluster.org* <bugs at gluster.org>,
> *gluster-users at gluster.org* <gluster-users at gluster.org>
> Date: 01/21/2019 10:03 PM
> Subject: Bricks are going offline unable to recover with
> heal/start force commands
> ------------------------------
>
>
> Hi,
>
> Bricks are in offline and unable to recover with following commands
>
> gluster volume heal <vol-name>
>
> gluster volume start <vol-name> force
>
> But still bricks are offline.
>
>
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process TCP Port RDMA Port Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick 49166 0 Y
> 269
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y
> 225
> Self-heal Daemon on localhost N/A N/A Y
> 45826
> Self-heal Daemon on 192.168.3.6 N/A N/A Y
> 65196
> Self-heal Daemon on 192.168.3.15 N/A N/A Y
> 52915
>
> Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
>
> ------------------------------------------------------------------------------
>
>
> We can see following events from when we start forcing volumes
>
> /mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:34.555068] E [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
> [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Failed to execute script:
> /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:53.389049] I [MSGID: 106499]
> [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8
> [2019-01-21 08:23:25.346839] I [MSGID: 106487]
> [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd:
> Received cli list req
>
>
> We can see following events from when we heal volumes.
>
> [2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk]
> 0-cli: Received resp to heal volume
> [2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1
> [2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:30.463648] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:34.581555] I
> [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start
> volume
> [2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:53.387992] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:23:25.346319] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
>
>
>
> Please let us know steps to recover bricks.
>
>
> BR
> Salam
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
> _______________________________________________
> Bugs mailing list
> *Bugs at gluster.org* <Bugs at gluster.org>
> *https://lists.gluster.org/mailman/listinfo/bugs*
> <https://lists.gluster.org/mailman/listinfo/bugs>
>
>
> --
> Amar Tumballi (amarts)
> _______________________________________________
> Gluster-users mailing list
> *Gluster-users at gluster.org* <Gluster-users at gluster.org>
> *https://lists.gluster.org/mailman/listinfo/gluster-users*
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
> --
> Thanks,
> Sanju
>
>
> --
> Thanks,
> Sanju
>
>
> --
> Thanks,
> Sanju
>
>
> --
> Thanks,
> Sanju
>
--
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190124/ebdb836a/attachment.html>
More information about the Gluster-users
mailing list