[Gluster-users] [Bugs] Bricks are going offline unable to recover with heal/start force commands

Thu Jan 24 09:02:26 UTC 2019

Shaik,

Sorry to ask this again. What errors are you seeing in glusterd logs? Can
you share the latest logs?

On Thu, Jan 24, 2019 at 2:05 PM Shaik Salam <shaik.salam at tcs.com> wrote:

> Hi Sanju,
>
> Please find requsted information.
>
> Are you still seeing the error "Unable to read pidfile:" in glusterd log?
>  >>>>  No
> Are you seeing "brick is deemed not to be a part of the volume" error in
> glusterd log?>>>> No
>
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae1^C8ab7782dd57cf5b6c1/brick
> sh-4.2# pwd
>
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
> sh-4.2# getfattr -m -d -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
> sh-4.2# getfattr -d -m . -e hex
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
> getfattr: Removing leading '/' from absolute path names
> # file:
> var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
>
> trusted.afr.vol_3442e86b6d994a14de73f1b8c82cf0b8-client-0=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x15477f3622e84757a0ce9000b63fa849
>
> sh-4.2# ls -la |wc -l
> 86
> sh-4.2# pwd
>
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> sh-4.2#
>
>
>
> From:        "Sanju Rakonde" <srakonde at redhat.com>
> To:        "Shaik Salam" <shaik.salam at tcs.com>
> Cc:        "Amar Tumballi Suryanarayan" <atumball at redhat.com>, "
> gluster-users at gluster.org List" <gluster-users at gluster.org>, "Murali
> Kottakota" <murali.kottakota at tcs.com>
> Date:        01/24/2019 01:38 PM
> Subject:        Re: [Gluster-users] [Bugs] Bricks are going offline
> unable to recover with heal/start force commands
> ------------------------------
>
>
>
> *"External email. Open with Caution"*
> Shaik,
>
> Previously I was suspecting, whether brick pid file is missing. But I see
> it is present.
>
> From second node (this brick is in offline state):
>
> /var/run/gluster/vols/vol_3442e86b6d994a14de73f1b8c82cf0b8/192.168.3.5-var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.pid
> 271
>  Are you still seeing the error "Unable to read pidfile:" in glusterd log?
>
> I also suspect whether brick is missing its extended attributes. Are you
> seeing "brick is deemed not to be a part of the volume" error in glusterd
> log? If not can you please provide us output of  "getfattr -m -d -e hex
> <brickpath>"
>
> On Thu, Jan 24, 2019 at 12:18 PM Shaik Salam <*shaik.salam at tcs.com*
> <shaik.salam at tcs.com>> wrote:
> Hi Sanju,
>
> Could you please have look my issue if you have time (atleast provide
> workaround).
>
> BR
> Salam
>
>
>
> From:        Shaik Salam/HYD/TCS
> To:        "Sanju Rakonde" <*srakonde at redhat.com* <srakonde at redhat.com>>
> Cc:        "Amar Tumballi Suryanarayan" <*atumball at redhat.com*
> <atumball at redhat.com>>, "*gluster-users at gluster.org*
> <gluster-users at gluster.org> List" <*gluster-users at gluster.org*
> <gluster-users at gluster.org>>, "Murali Kottakota" <
> *murali.kottakota at tcs.com* <murali.kottakota at tcs.com>>
> Date:        01/23/2019 05:50 PM
> Subject:        Re: [Gluster-users] [Bugs] Bricks are going offline
> unable to recover with heal/start force commands
> ------------------------------
>
>
>
>
> Hi Sanju,
>
> Please find requested information.
>
> Sorry to repeat again I am trying start force command once brick log
> enabled to debug by taking one volume example.
> Please correct me If I am doing wrong.
>
>
> [root at master ~]# oc rsh glusterfs-storage-vll7x
> sh-4.2# gluster volume info vol_3442e86b6d994a14de73f1b8c82cf0b8
>
> Volume Name: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Type: Replicate
> Volume ID: 15477f36-22e8-4757-a0ce-9000b63fa849
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.3.6:
> /var/lib/heketi/mounts/vg_ca57f326195c243be2380ce4e42a4191/brick_952d75fd193c7209c9a81acbc23a3747/brick
> Brick2: 192.168.3.5:
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/
> brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> Brick3: 192.168.3.15:
> /var/lib/heketi/mounts/vg_462ea199185376b03e4b0317363bb88c/brick_1736459d19e8aaa1dcb5a87f48747d04/brick
> Options Reconfigured:
> diagnostics.brick-log-level: INFO
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
> ------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick         49157     0          Y
> 250
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y
> 225
> Self-heal Daemon on localhost               N/A       N/A        Y
> 108434
> Self-heal Daemon on matrix1.matrix.orange.l
> ab                                          N/A       N/A        Y
> 69525
> Self-heal Daemon on matrix2.matrix.orange.l
> ab                                          N/A       N/A        Y
> 18569
>
> gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8
> diagnostics.brick-log-level DEBUG
> volume set: success
> sh-4.2# gluster volume get vol_3442e86b6d994a14de73f1b8c82cf0b8 all |grep
> log
> cluster.entry-change-log                on
> cluster.data-change-log                 on
> cluster.metadata-change-log             on
> diagnostics.brick-log-level             DEBUG
>
> sh-4.2# cd /var/log/glusterfs/bricks/
> sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1
> -rw-------. 1 root root       0 Jan 20 02:46
>  var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log
>  >>> Noting in log
>
> -rw-------. 1 root root  189057 Jan 18 09:20
> var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log-20190120
>
> [2019-01-23 11:49:32.475956] I [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o
> diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd
> [2019-01-23 11:49:32.483191] I [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o
> diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd
> [2019-01-23 11:48:59.111292] W [MSGID: 106036]
> [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management:
> Snapshot list failed
> [2019-01-23 11:50:14.112271] E [MSGID: 106026]
> [glusterd-snapshot.c:3962:glusterd_handle_snapshot_list] 0-management:
> Volume (vol_63854b105c40802bdec77290e91858ea) does not exist [Invalid
> argument]
> [2019-01-23 11:50:14.112305] W [MSGID: 106036]
> [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management:
> Snapshot list failed
> [2019-01-23 11:50:20.322902] I
> [glusterd-utils.c:5994:glusterd_brick_start] 0-management: discovered
> already-running brick
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> [2019-01-23 11:50:20.322925] I [MSGID: 106142]
> [glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick
> /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick
> on port 49165
> [2019-01-23 11:50:20.327557] I [MSGID: 106131]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already
> stopped
> [2019-01-23 11:50:20.327586] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is
> stopped
> [2019-01-23 11:50:20.327604] I [MSGID: 106599]
> [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so
> xlator is not installed
> [2019-01-23 11:50:20.337735] I [MSGID: 106568]
> [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping
> glustershd daemon running in pid: 69525
> [2019-01-23 11:50:21.338058] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: glustershd
> service is stopped
> [2019-01-23 11:50:21.338180] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:203:glusterd_svc_start] 0-management: Starting
> glustershd service
> [2019-01-23 11:50:21.348234] I [MSGID: 106131]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already
> stopped
> [2019-01-23 11:50:21.348285] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is
> stopped
> [2019-01-23 11:50:21.348866] I [MSGID: 106131]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already
> stopped
> [2019-01-23 11:50:21.348883] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is
> stopped
> [2019-01-23 11:50:22.356502] I [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-23 11:50:22.368845] E [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
> [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Failed to execute script:
> /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
>
>
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
> ------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick         49157     0          Y
> 250
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y
> 225
> Self-heal Daemon on localhost               N/A       N/A        Y
> 109550
> Self-heal Daemon on 192.168.3.6             N/A       N/A        Y
> 52557
> Self-heal Daemon on 192.168.3.15            N/A       N/A        Y
> 16946
>
> Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
>
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
>
>
> From:        "Sanju Rakonde" <*srakonde at redhat.com* <srakonde at redhat.com>>
> To:        "Shaik Salam" <*shaik.salam at tcs.com* <shaik.salam at tcs.com>>
> Cc:        "Amar Tumballi Suryanarayan" <*atumball at redhat.com*
> <atumball at redhat.com>>, "*gluster-users at gluster.org*
> <gluster-users at gluster.org> List" <*gluster-users at gluster.org*
> <gluster-users at gluster.org>>, "Murali Kottakota" <
> *murali.kottakota at tcs.com* <murali.kottakota at tcs.com>>
> Date:        01/23/2019 02:15 PM
> Subject:        Re: [Gluster-users] [Bugs] Bricks are going offline
> unable to recover with heal/start force commands
> ------------------------------
>
>
>
> * "External email. Open with Caution"*
> Hi Shaik,
>
> I can see below errors in glusterd logs.
>
> [2019-01-22 09:20:17.540196] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_e1aa1283d5917485d88c4a742eeff422/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_9e7c382e5f853d471c347bc5590359af-brick.pid
>
> [2019-01-22 09:20:17.546408] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_f0ed498d7e781d7bb896244175b31f9e/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_47ed9e0663ad0f6f676ddd6ad7e3dcde-brick.pid
>
> [2019-01-22 09:20:17.552575] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_f387519c9b004ec14e80696db88ef0f8/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_06ad6c73dfbf6a5fc21334f98c9973c2-brick.pid
>
> [2019-01-22 09:20:17.558888] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_f8ca343c60e6efe541fe02d16ca02a7d/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_525225f65753b05dfe33aeaeb9c5de39-brick.pid
>
> [2019-01-22 09:20:17.565266] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/vols/vol_fe882e074c0512fd9271fc2ff5a0bfe1/192.168.3.6-var-lib-heketi-mounts-vg_28708570b029e5eff0a996c453a11691-brick_d4f30d6e465a8544b759a7016fb5aab5-brick.pid
>
> [2019-01-22 09:20:17.585926] E [MSGID: 106028]
> [glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get pid
> of brick process
> [2019-01-22 09:20:17.617806] E [MSGID: 106028]
> [glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get pid
> of brick process
> [2019-01-22 09:20:17.649628] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/glustershd/glustershd.pid
> [2019-01-22 09:20:17.649700] E [MSGID: 101012]
> [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile:
> /var/run/gluster/glustershd/glustershd.pid
>
> So it looks like, neither gf_is_service_running()
> nor glusterd_brick_signal() are able to read the pid file. That means
> pidfiles might be having nothing to read.
>
> Can you please paste the contents of brick pidfiles. You can find brick
> pidfiles in /var/run/gluster/vols/<volname>/ or you can just run this
> command "for i in `ls /var/run/gluster/vols/*/*.pid`;do echo $i;cat
> $i;done"
>
> On Wed, Jan 23, 2019 at 12:49 PM Shaik Salam <*shaik.salam at tcs.com*
> <shaik.salam at tcs.com>> wrote:
> Hi Sanju,
>
> Please find requested information attached logs.
>
>
>
>
> Below brick is offline and try to start force/heal commands but doesn't
> makes up.
>
> sh-4.2#
> sh-4.2# gluster --version
> glusterfs 4.1.5
>
>
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
> ------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick         49166     0          Y
> 269
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y
> 225
> Self-heal Daemon on localhost               N/A       N/A        Y
> 45826
> Self-heal Daemon on 192.168.3.6             N/A       N/A        Y
> 65196
> Self-heal Daemon on 192.168.3.15            N/A       N/A        Y
> 52915
>
> Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
>
> ------------------------------------------------------------------------------
>
>
> We can see following events from when we start forcing volumes
>
> /mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:34.555068] E [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
> [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Failed to execute script:
> /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:53.389049] I [MSGID: 106499]
> [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8
> [2019-01-21 08:23:25.346839] I [MSGID: 106487]
> [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd:
> Received cli list req
>
>
> We can see following events from when we heal volumes.
>
> [2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk]
> 0-cli: Received resp to heal volume
> [2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1
> [2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:30.463648] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:34.581555] I
> [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start
> volume
> [2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:53.387992] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:23:25.346319] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
>
>
> Enabled DEBUG mode for brick level. But nothing writing to brick log.
>
> gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8
> diagnostics.brick-log-level DEBUG
>
> sh-4.2# pwd
> /var/log/glusterfs/bricks
>
> sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1
> -rw-------. 1 root root       0 Jan 20 02:46
> var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log
>
>
>
>
>
>
> From:        Sanju Rakonde <*srakonde at redhat.com* <srakonde at redhat.com>>
> To:        Shaik Salam <*shaik.salam at tcs.com* <shaik.salam at tcs.com>>
> Cc:        Amar Tumballi Suryanarayan <*atumball at redhat.com*
> <atumball at redhat.com>>, "*gluster-users at gluster.org*
> <gluster-users at gluster.org> List" <*gluster-users at gluster.org*
> <gluster-users at gluster.org>>
> Date:        01/22/2019 02:21 PM
> Subject:        Re: [Gluster-users] [Bugs] Bricks are going offline
> unable to recover with heal/start force commands
> ------------------------------
>
>
>
> * "External email. Open with Caution"*
> Hi Shaik,
>
> Can you please provide us complete glusterd and cmd_history logs from all
> the nodes in the cluster? Also please paste output of the following
> commands (from all nodes):
> 1. gluster --version
> 2. gluster volume info
> 3. gluster volume status
> 4. gluster peer status
> 5. ps -ax | grep glusterfsd
>
> On Tue, Jan 22, 2019 at 12:47 PM Shaik Salam <*shaik.salam at tcs.com*
> <shaik.salam at tcs.com>> wrote:
> Hi Surya,
>
> It is already customer setup and cant redeploy again.
> Enabled debug for brick level log but nothing writing to it.
> Can you tell me is any other ways to troubleshoot  or logs to look??
>
>
> From:        Shaik Salam/HYD/TCS
> To:        "Amar Tumballi Suryanarayan" <*atumball at redhat.com*
> <atumball at redhat.com>>
> Cc:        "*gluster-users at gluster.org* <gluster-users at gluster.org> List"
> <*gluster-users at gluster.org* <gluster-users at gluster.org>>
> Date:        01/22/2019 12:06 PM
> Subject:        Re: [Bugs] Bricks are going offline unable to recover
> with heal/start force commands
> ------------------------------
>
>
> Hi Surya,
>
> I have enabled DEBUG mode for brick level. But nothing writing to brick
> log.
>
> gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8
> diagnostics.brick-log-level DEBUG
>
> sh-4.2# pwd
> /var/log/glusterfs/bricks
>
> sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1
> -rw-------. 1 root root       0 Jan 20 02:46
> var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log
>
> BR
> Salam
>
>
>
>
> From:        "Amar Tumballi Suryanarayan" <*atumball at redhat.com*
> <atumball at redhat.com>>
> To:        "Shaik Salam" <*shaik.salam at tcs.com* <shaik.salam at tcs.com>>
> Cc:        "*gluster-users at gluster.org* <gluster-users at gluster.org> List"
> <*gluster-users at gluster.org* <gluster-users at gluster.org>>
> Date:        01/22/2019 11:38 AM
> Subject:        Re: [Bugs] Bricks are going offline unable to recover
> with heal/start force commands
> ------------------------------
>
>
>
> * "External email. Open with Caution"*
> Hi Shaik,
>
> Can you check what is there in brick logs? They are located in
> /var/log/glusterfs/bricks/*?
>
> Looks like the samba hooks script failed, but that shouldn't matter in
> this use case.
>
> Also, I see that you are trying to setup heketi to provision volumes,
> which means you may be using gluster in container usecases. If you are
> still in 'PoC' phase, can you give *https://github.com/gluster/gcs*
> <https://github.com/gluster/gcs> a try? That makes the deployment and the
> stack little simpler.
>
> -Amar
>
>
>
>
> On Tue, Jan 22, 2019 at 11:29 AM Shaik Salam <*shaik.salam at tcs.com*
> <shaik.salam at tcs.com>> wrote:
> Can anyone respond how to recover bricks apart from heal/start force
> according to below events from logs.
> Please let me know any other logs required.
> Thanks in advance.
>
> BR
> Salam
>
>
>
> From:        Shaik Salam/HYD/TCS
> To:        *bugs at gluster.org* <bugs at gluster.org>,
> *gluster-users at gluster.org* <gluster-users at gluster.org>
> Date:        01/21/2019 10:03 PM
> Subject:        Bricks are going offline unable to recover with
> heal/start force commands
> ------------------------------
>
>
> Hi,
>
> Bricks are in offline and  unable to recover with following commands
>
> gluster volume heal <vol-name>
>
> gluster volume start <vol-name> force
>
> But still bricks are offline.
>
>
> sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8
> Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
> ------------------------------------------------------------------------------
> Brick 192.168.3.6:/var/lib/heketi/mounts/vg
> _ca57f326195c243be2380ce4e42a4191/brick_952
> d75fd193c7209c9a81acbc23a3747/brick         49166     0          Y
> 269
> Brick 192.168.3.5:/var/lib/heketi/mounts/vg
> _d5f17487744584e3652d3ca943b0b91b/brick_e15
> c12cceae12c8ab7782dd57cf5b6c1/brick         N/A       N/A        N
> N/A
> Brick 192.168.3.15:/var/lib/heketi/mounts/v
> g_462ea199185376b03e4b0317363bb88c/brick_17
> 36459d19e8aaa1dcb5a87f48747d04/brick        49173     0          Y
> 225
> Self-heal Daemon on localhost               N/A       N/A        Y
> 45826
> Self-heal Daemon on 192.168.3.6             N/A       N/A        Y
> 65196
> Self-heal Daemon on 192.168.3.15            N/A       N/A        Y
> 52915
>
> Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8
>
> ------------------------------------------------------------------------------
>
>
> We can see following events from when we start forcing volumes
>
> /mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605)
> [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:34.555068] E [run.c:241:runner_log]
> (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a)
> [0x7fca9e139b3a]
> -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563)
> [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fcaa346f0e5] ) 0-management: Failed to execute script:
> /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
> --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1
> --volume-op=start --gd-workdir=/var/lib/glusterd
> [2019-01-21 08:22:53.389049] I [MSGID: 106499]
> [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8
> [2019-01-21 08:23:25.346839] I [MSGID: 106487]
> [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd:
> Received cli list req
>
>
> We can see following events from when we heal volumes.
>
> [2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk]
> 0-cli: Received resp to heal volume
> [2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1
> [2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:30.463648] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:34.581555] I
> [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start
> volume
> [2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:22:53.387992] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
> [2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0
> [2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running
> gluster with version 4.1.5
> [2019-01-21 08:23:25.346319] I [MSGID: 101190]
> [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler]
> 0-transport: EPOLLERR - disconnecting now
> [2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glusterfs: error returned while attempting to connect to host:(null),
> port:0
>
>
>
> Please let us know steps to recover bricks.
>
>
> BR
> Salam
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
> _______________________________________________
> Bugs mailing list
> *Bugs at gluster.org* <Bugs at gluster.org>
> *https://lists.gluster.org/mailman/listinfo/bugs*
> <https://lists.gluster.org/mailman/listinfo/bugs>
>
>
> --
> Amar Tumballi (amarts)
> _______________________________________________
> Gluster-users mailing list
> *Gluster-users at gluster.org* <Gluster-users at gluster.org>
> *https://lists.gluster.org/mailman/listinfo/gluster-users*
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
> --
> Thanks,
> Sanju
>
>
> --
> Thanks,
> Sanju
>
>
> --
> Thanks,
> Sanju
>

-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190124/13ccecdf/attachment-0001.html>