<html><head></head><body><div class="ydp1aa50dd5yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:16px;"><div></div>
<div dir="ltr" data-setdir="false">Hi Strahil,</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">Thanks for sharing the steps. I have tried all the steps mentioned except 6 and 9.</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">Let me try them as well and see how it is responding.</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">Thanks,</div><div dir="ltr" data-setdir="false">Ahemad</div><div><br></div>
</div><div id="yahoo_quoted_2621931960" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
On Wednesday, 17 June, 2020, 02:51:48 pm IST, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
</div>
<div><br></div>
<div><br></div>
<div><div dir="ltr">Hi Ahemad,<br clear="none"><br clear="none">most probably the reason of the unexpected downtime lies somewhere else and you just observe symptoms.<br clear="none"><br clear="none">So, you have replica 3 volume on 3 separate hosts , right ?<br clear="none"><br clear="none">Here is what I think you should do on a TEST cluster (could be VMs even kn your laptop):<br clear="none">1. Create 1 brick on each VM<br clear="none">2. Create the TSP<br clear="none">3. Create the replica 3 volume<br clear="none">4. Enable & start glusterfsd.service on all VMs<br clear="none">5. Connect another VM via fuse and use the mount like this one:<br clear="none"><br clear="none">mount -t glusterfs -o backup-volfile-servers=vm2:vm3 vm1:/volume1 /mnt<br clear="none"><br clear="none">6. Now test hardware failure - power off VM1 ungracefully. The fuse client should recover in less than a minute - this is defined by the volume timeout<br clear="none">7. Power up vm1 and check the heal status<br clear="none">8. Once the heal is over you can proceed<br clear="none">9. Test planned maintenance - use the gluster script to kill all gluster processes on vm2. FUSE client should not hang and should not notice anything.<br clear="none">10. Start glusterd.service and then forcefully start the brick:<br clear="none">gluster volume start volume1 force<br clear="none">Check status:<br clear="none">gluster volume status golume1 <br clear="none"><br clear="none">Wait for the heals to complete.<br clear="none">All bricks should be online.<br clear="none"><br clear="none">11. Now shutdown vm3 gracefully. The glusterfsd.service should kill all gluster processses and the FUSE client should never experience any issues.<br clear="none"><br clear="none">The only case where you can observe partial downtime with replica 3 is when there were pending heals and one of the 2 good sources has failed/powerdown before the heal has completed.<br clear="none"><br clear="none">Usually there are 2 healing mechanisms:<br clear="none">1) When a FUSE client access a file that has some differences (2 bricks same file, 1 brick older version), it will try to correct the issues.<br clear="none">2) There is a daemon that crawls over the pending heals every 15min (I'm not sure, maybe even 10) and heals any 'blame's.<br clear="none"><br clear="none">You can use 'gluster golume heal volume1 full' to initiate a full heal, but on large bricks ot could take a long time and is usually used after brick replacement/reset.<br clear="none"><br clear="none">Best Regards,<br clear="none">Strahil Nikolov<br clear="none"><div class="yqt1396103586" id="yqtfd25831"><br clear="none">На 17 юни 2020 г. 10:45:27 GMT+03:00, ahemad shaik <<a shape="rect" ymailto="mailto:ahemad_shaik@yahoo.com" href="mailto:ahemad_shaik@yahoo.com">ahemad_shaik@yahoo.com</a>> написа:<br clear="none">> Thanks Karthik for the information. Let me try.<br clear="none">>Thanks,Ahemad<br clear="none">>On Wednesday, 17 June, 2020, 12:43:29 pm IST, Karthik Subrahmanya<br clear="none">><<a shape="rect" ymailto="mailto:ksubrahm@redhat.com" href="mailto:ksubrahm@redhat.com">ksubrahm@redhat.com</a>> wrote: <br clear="none">> <br clear="none">> Hi Ahemad,<br clear="none">>Glad to hear that your problem is resolved. Thanks Strahil and Hubert<br clear="none">>for your suggestions.<br clear="none">><br clear="none">>On Wed, Jun 17, 2020 at 12:29 PM ahemad shaik <<a shape="rect" ymailto="mailto:ahemad_shaik@yahoo.com" href="mailto:ahemad_shaik@yahoo.com">ahemad_shaik@yahoo.com</a>><br clear="none">>wrote:<br clear="none">><br clear="none">> Hi <br clear="none">>I tried starting and enabling the glusterfsd service suggested by<br clear="none">>Hubert and Strahil, I see that works when one of the gluster volume is<br clear="none">>not available and client still able to access the mount point.<br clear="none">>Thanks so much Strahil , Hubert and Karthik on your suggestion and for<br clear="none">>the time.<br clear="none">>can you please help on making data consistent in all nodes when we have<br clear="none">>some 5 hours of down time and one of the server . how to achieve data<br clear="none">>consistency in all 3 nodes.<br clear="none">>When the node/brick which was down comes back up, gluster self heal<br clear="none">>daemon (glustershd) will automatically do the syncing of the data to<br clear="none">>the down brick and make it consistent with the good copies. You can<br clear="none">>alternatively run the index heal command "gluster volume heal<br clear="none">><vol-name>" to trigger the heal manually and you can see the entries<br clear="none">>needing heal and the progress of heal by running "gluster volume heal<br clear="none">><vol-name> info".<br clear="none">>HTH,Karthik<br clear="none">><br clear="none">>Any documentation on that end will be helpful.<br clear="none">>Thanks,Ahemad<br clear="none">>On Wednesday, 17 June, 2020, 12:03:06 pm IST, Karthik Subrahmanya<br clear="none">><<a shape="rect" ymailto="mailto:ksubrahm@redhat.com" href="mailto:ksubrahm@redhat.com">ksubrahm@redhat.com</a>> wrote: <br clear="none">> <br clear="none">> Hi Ahemad,<br clear="none">>Sorry for a lot of back and forth on this. But we might need a few more<br clear="none">>details to find the actual cause here.What version of gluster you are<br clear="none">>running on server and client nodes?Also provide the statedump [1] of<br clear="none">>the bricks and the client process when the hang is seen.<br clear="none">>[1] <a shape="rect" href="https://docs.gluster.org/en/latest/Troubleshooting/statedump/" target="_blank">https://docs.gluster.org/en/latest/Troubleshooting/statedump/</a><br clear="none">>Regards,Karthik<br clear="none">>On Wed, Jun 17, 2020 at 9:25 AM <a shape="rect" ymailto="mailto:ahemad_shaik@yahoo.com" href="mailto:ahemad_shaik@yahoo.com">ahemad_shaik@yahoo.com</a><br clear="none">><<a shape="rect" ymailto="mailto:ahemad_shaik@yahoo.com" href="mailto:ahemad_shaik@yahoo.com">ahemad_shaik@yahoo.com</a>> wrote:<br clear="none">><br clear="none">>I have a 3 replica gluster volume created in 3 nodes and when one node<br clear="none">>is down due to some issue and the clients not able access volume. This<br clear="none">>was the issue. I have fixed the server and it is back. There was<br clear="none">>downtime at client. I just want to avoid the downtime since it is 3<br clear="none">>replica.<br clear="none">>I am testing the high availability now by making one of the brick<br clear="none">>server rebooting or shut down manually. I just want to make volume<br clear="none">>accessible always by client. That is the reason we went for replica<br clear="none">>volume.<br clear="none">>So I just would like to know how to make the client volume high<br clear="none">>available even some VM or node which is having gluster volume goes down<br clear="none">>unexpectedly had down time of 10 hours.<br clear="none">><br clear="none">><br clear="none">>Glusterfsd service is used to stop which is disabled in my cluster and<br clear="none">>I see one more service running gluserd. <br clear="none">>Will starting glusterfsd service in all 3 replica nodes will help in<br clear="none">>achieving what I am trying.<br clear="none">>Hope I am clear.<br clear="none">>Thanks,Ahemad<br clear="none">><br clear="none">><br clear="none">>Thanks,Ahemad<br clear="none">><br clear="none">> <br clear="none">> <br clear="none">>On Tue, Jun 16, 2020 at 23:12, Strahil Nikolov<<a shape="rect" ymailto="mailto:hunter86_bg@yahoo.com" href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a>><br clear="none">>wrote: In my cluster , the service is enabled and running.<br clear="none">><br clear="none">>What actually is your problem ?<br clear="none">>When a gluster brick process dies unexpectedly - all fuse clients will<br clear="none">>be waiting for the timeout .<br clear="none">>The service glusterfsd is ensuring that during system shutdown , the<br clear="none">>brick procesees will be shutdown in such way that all native clients <br clear="none">>won't 'hang' and wait for the timeout, but will directly choose <br clear="none">>another brick.<br clear="none">><br clear="none">>The same happens when you manually run the kill script - all gluster<br clear="none">>processes shutdown and all clients are redirected to another brick.<br clear="none">><br clear="none">>Keep in mind that fuse mounts will also be killed both by the script<br clear="none">>and the glusterfsd service.<br clear="none">><br clear="none">>Best Regards,<br clear="none">>Strahil Nikolov<br clear="none">><br clear="none">>На 16 юни 2020 г. 19:48:32 GMT+03:00, ahemad shaik<br clear="none">><<a shape="rect" ymailto="mailto:ahemad_shaik@yahoo.com" href="mailto:ahemad_shaik@yahoo.com">ahemad_shaik@yahoo.com</a>> написа:<br clear="none">>> Hi Strahil,<br clear="none">>>I have the gluster setup on centos 7 cluster.I see glusterfsd service<br clear="none">>>and it is in inactive state.<br clear="none">>>systemctl status glusterfsd.service● glusterfsd.service - GlusterFS<br clear="none">>>brick processes (stopping only) Loaded: loaded<br clear="none">>>(/usr/lib/systemd/system/glusterfsd.service; disabled; vendor preset:<br clear="none">>>disabled) Active: inactive (dead)<br clear="none">>><br clear="none">>>so you mean starting this service in all the nodes where gluster<br clear="none">>>volumes are created, will solve the issue ?<br clear="none">>><br clear="none">>>Thanks,Ahemad<br clear="none">>><br clear="none">>><br clear="none">>>On Tuesday, 16 June, 2020, 10:12:22 pm IST, Strahil Nikolov<br clear="none">>><<a shape="rect" ymailto="mailto:hunter86_bg@yahoo.com" href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a>> wrote: <br clear="none">>> <br clear="none">>> Hi ahemad,<br clear="none">>><br clear="none">>>the script kills all gluster processes, so the clients won't<br clear="none">>wait <br clear="none">>>for the timeout before switching to another node in the TSP.<br clear="none">>><br clear="none">>>In CentOS/RHEL, there is a systemd service called<br clear="none">>>'glusterfsd.service' that is taking care on shutdown to kill all<br clear="none">>>processes, so clients won't hung.<br clear="none">>><br clear="none">>>systemctl cat glusterfsd.service --no-pager<br clear="none">>># /usr/lib/systemd/system/glusterfsd.service<br clear="none">>>[Unit]<br clear="none">>>Description=GlusterFS brick processes (stopping only)<br clear="none">>>After=network.target glusterd.service<br clear="none">>><br clear="none">>>[Service]<br clear="none">>>Type=oneshot<br clear="none">>># glusterd starts the glusterfsd processed on-demand<br clear="none">>># /bin/true will mark this service as started, RemainAfterExit keeps<br clear="none">>it<br clear="none">>>active<br clear="none">>>ExecStart=/bin/true<br clear="none">>>RemainAfterExit=yes<br clear="none">>># if there are no glusterfsd processes, a stop/reload should not give<br clear="none">>>an error<br clear="none">>>ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true"<br clear="none">>>ExecReload=/bin/sh -c "/bin/killall -HUP glusterfsd || /bin/true"<br clear="none">>><br clear="none">>>[Install]<br clear="none">>>WantedBy=multi-user.target<br clear="none">>><br clear="none">>>Best Regards,<br clear="none">>>Strahil Nikolov<br clear="none">>><br clear="none">>>На 16 юни 2020 г. 18:41:59 GMT+03:00, ahemad shaik<br clear="none">>><<a shape="rect" ymailto="mailto:ahemad_shaik@yahoo.com" href="mailto:ahemad_shaik@yahoo.com">ahemad_shaik@yahoo.com</a>> написа:<br clear="none">>>> Hi, <br clear="none">>>>I see there is a script file in below mentioned path in all nodes<br clear="none">>>using<br clear="none">>>>which gluster volume<br clear="none">>>>created./usr/share/glusterfs/scripts/stop-all-gluster-processes.sh<br clear="none">>>>I need to create a system service and when ever there is some server<br clear="none">>>>down, we need to call this script or we need to have it run always it<br clear="none">>>>will take care when some node is down to make sure that client will<br clear="none">>>not<br clear="none">>>>have any issues in accessing mount point ?<br clear="none">>>>can you please share any documentation on how to use this.That will<br clear="none">>be<br clear="none">>>>great help.<br clear="none">>>>Thanks,Ahemad<br clear="none">>>><br clear="none">>>><br clear="none">>>><br clear="none">>>><br clear="none">>>>On Tuesday, 16 June, 2020, 08:59:31 pm IST, Strahil Nikolov<br clear="none">>>><<a shape="rect" ymailto="mailto:hunter86_bg@yahoo.com" href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a>> wrote: <br clear="none">>>> <br clear="none">>>> Hi Ahemad,<br clear="none">>>><br clear="none">>>>You can simplify it by creating a systemd service that will call <br clear="none">>>>the script.<br clear="none">>>><br clear="none">>>>It was already mentioned in a previous thread (with example), so <br clear="none">>>>you can just use it.<br clear="none">>>><br clear="none">>>>Best Regards,<br clear="none">>>>Strahil Nikolov<br clear="none">>>><br clear="none">>>>На 16 юни 2020 г. 16:02:07 GMT+03:00, Hu Bert<br clear="none">><<a shape="rect" ymailto="mailto:revirii@googlemail.com" href="mailto:revirii@googlemail.com">revirii@googlemail.com</a>><br clear="none">>>>написа:<br clear="none">>>>>Hi,<br clear="none">>>>><br clear="none">>>>>if you simply reboot or shutdown one of the gluster nodes, there<br clear="none">>>might<br clear="none">>>>>be a (short or medium) unavailability of the volume on the nodes. To<br clear="none">>>>>avoid this there's script:<br clear="none">>>>><br clear="none">>>>>/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh (path may<br clear="none">>>>>be different depending on distribution)<br clear="none">>>>><br clear="none">>>>>If i remember correctly: this notifies the clients that this node is<br clear="none">>>>>going to be unavailable (please correct me if the details are<br clear="none">>wrong).<br clear="none">>>>>If i do reboots of one gluster node, i always call this script and<br clear="none">>>>>never have seen unavailability issues on the clients.<br clear="none">>>>><br clear="none">>>>><br clear="none">>>>>Regards,<br clear="none">>>>>Hubert<br clear="none">>>>><br clear="none">>>>>Am Mo., 15. Juni 2020 um 19:36 Uhr schrieb ahemad shaik<br clear="none">>>>><<a shape="rect" ymailto="mailto:ahemad_shaik@yahoo.com" href="mailto:ahemad_shaik@yahoo.com">ahemad_shaik@yahoo.com</a>>:<br clear="none">>>>>><br clear="none">>>>>> Hi There,<br clear="none">>>>>><br clear="none">>>>>> I have created 3 replica gluster volume with 3 bricks from 3<br clear="none">>nodes.<br clear="none">>>>>><br clear="none">>>>>> "gluster volume create glustervol replica 3 transport tcp<br clear="none">>>>node1:/data<br clear="none">>>>>node2:/data node3:/data force"<br clear="none">>>>>><br clear="none">>>>>> mounted on client node using below command.<br clear="none">>>>>><br clear="none">>>>>> "mount -t glusterfs node4:/glustervol /mnt/"<br clear="none">>>>>><br clear="none">>>>>> when any of the node (either node1,node2 or node3) goes down,<br clear="none">>>>gluster<br clear="none">>>>>mount/volume (/mnt) not accessible at client (node4).<br clear="none">>>>>><br clear="none">>>>>> purpose of replicated volume is high availability but not able to<br clear="none">>>>>achieve it.<br clear="none">>>>>><br clear="none">>>>>> Is it a bug or i am missing anything.<br clear="none">>>>>><br clear="none">>>>>><br clear="none">>>>>> Any suggestions will be great help!!!<br clear="none">>>>>><br clear="none">>>>>> kindly suggest.<br clear="none">>>>>><br clear="none">>>>>> Thanks,<br clear="none">>>>>> Ahemad<br clear="none">>>>>><br clear="none">>>>>> ________<br clear="none">>>>>><br clear="none">>>>>><br clear="none">>>>>><br clear="none">>>>>> Community Meeting Calendar:<br clear="none">>>>>><br clear="none">>>>>> Schedule -<br clear="none">>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">>>>>> Bridge: <a shape="rect" href="https://bluejeans.com/441850968" target="_blank">https://bluejeans.com/441850968</a><br clear="none">>>>>><br clear="none">>>>>> Gluster-users mailing list<br clear="none">>>>>> <a shape="rect" ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br clear="none">>>>>> <a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">>>>>________<br clear="none">>>>><br clear="none">>>>><br clear="none">>>>><br clear="none">>>>>Community Meeting Calendar:<br clear="none">>>>><br clear="none">>>>>Schedule -<br clear="none">>>>>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">>>>>Bridge: <a shape="rect" href="https://bluejeans.com/441850968" target="_blank">https://bluejeans.com/441850968</a><br clear="none">>>>><br clear="none">>>>>Gluster-users mailing list<br clear="none">>>>><a shape="rect" ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br clear="none">>>>><a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users " target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users </a> <br clear="none">>________<br clear="none">><br clear="none">><br clear="none">><br clear="none">>Community Meeting Calendar:<br clear="none">><br clear="none">>Schedule -<br clear="none">>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">>Bridge: <a shape="rect" href="https://bluejeans.com/441850968" target="_blank">https://bluejeans.com/441850968</a><br clear="none">><br clear="none">>Gluster-users mailing list<br clear="none">><a shape="rect" ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br clear="none">><a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">><br clear="none">> <br clear="none">> </div></div></div>
</div>
</div></body></html>