<div dir="auto">Hi,<div dir="auto"><br></div><div dir="auto">Have you checked for any file system errors on the brick mount point?</div><div dir="auto"><br></div><div dir="auto">I once was facing weird io errors and xfs_repair fixed the issue. </div><div dir="auto"><br></div><div dir="auto">What about the heal? Does it report any pending heals? </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Feb 15, 2018 14:20, "Dave Sherohman" <<a href="mailto:dave@sherohman.org">dave@sherohman.org</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Well, it looks like I've stumped the list, so I did a bit of additional<br>
digging myself:<br>
<br>
azathoth replicates with yog-sothoth, so I compared their brick<br>
directories. `ls -R /var/local/brick0/data | md5sum` gives the same<br>
result on both servers, so the filenames are identical in both bricks.<br>
However, `du -s /var/local/brick0/data` shows that azathoth has about 3G<br>
more data (445G vs 442G) than yog.<br>
<br>
This seems consistent with my assumption that the problem is on<br>
yog-sothoth (everything is fine with only azathoth; there are problems<br>
with only yog-sothoth) and I am reminded that a few weeks ago,<br>
yog-sothoth was offline for 4-5 days, although it should have been<br>
brought back up-to-date once it came back online.<br>
<br>
So, assuming that the issue is stale/missing data on yog-sothoth, is<br>
there a way to force gluster to do a full refresh of the data from<br>
azathoth's brick to yog-sothoth's brick? I would have expected running<br>
heal and/or rebalance to do that sort of thing, but I've run them both<br>
(with and without fix-layout on the rebalance) and the problem persists.<br>
<br>
If there isn't a way to force a refresh, how risky would it be to kill<br>
gluster on yog-sothoth, wipe everything from /var/local/brick0, and then<br>
re-add it to the cluster as if I were replacing a physically failed<br>
disk? Seems like that should work in principle, but it feels dangerous<br>
to wipe the partition and rebuild, regardless.<br>
<br>
On Tue, Feb 13, 2018 at 07:33:44AM -0600, Dave Sherohman wrote:<br>
> I'm using gluster for a virt-store with 3x2 distributed/replicated<br>
> servers for 16 qemu/kvm/libvirt virtual machines using image files<br>
> stored in gluster and accessed via libgfapi. Eight of these disk images<br>
> are standalone, while the other eight are qcow2 images which all share a<br>
> single backing file.<br>
><br>
> For the most part, this is all working very well. However, one of the<br>
> gluster servers (azathoth) causes three of the standalone VMs and all 8<br>
> of the shared-backing-image VMs to fail if it goes down. Any of the<br>
> other gluster servers can go down with no problems; only azathoth causes<br>
> issues.<br>
><br>
> In addition, the kvm hosts have the gluster volume fuse mounted and one<br>
> of them (out of five) detects an error on the gluster volume and puts<br>
> the fuse mount into read-only mode if azathoth goes down. libgfapi<br>
> connections to the VM images continue to work normally from this host<br>
> despite this and the other four kvm hosts are unaffected.<br>
><br>
> It initially seemed relevant that I have the libgfapi URIs specified as<br>
> gluster://azathoth/..., but I've tried changing them to make the initial<br>
> connection via other gluster hosts and it had no effect on the problem.<br>
> Losing azathoth still took them out.<br>
><br>
> In addition to changing the mount URI, I've also manually run a heal and<br>
> rebalance on the volume, enabled the bitrot daemons (then turned them<br>
> back off a week later, since they reported no activity in that time),<br>
> and copied one of the standalone images to a new file in case it was a<br>
> problem with the file itself. As far as I can tell, none of these<br>
> attempts changed anything.<br>
><br>
> So I'm at a loss. Is this a known type of problem? If so, how do I fix<br>
> it? If not, what's the next step to troubleshoot it?<br>
><br>
><br>
> # gluster --version<br>
> glusterfs 3.8.8 built on Jan 11 2017 14:07:11<br>
> Repository revision: git://<a href="http://git.gluster.com/glusterfs.git" rel="noreferrer" target="_blank">git.gluster.com/<wbr>glusterfs.git</a><br>
><br>
> # gluster volume status<br>
> Status of volume: palantir<br>
> Gluster process TCP Port RDMA Port Online<br>
> Pid<br>
> ------------------------------<wbr>------------------------------<wbr>------------------<br>
> Brick saruman:/var/local/brick0/data 49154 0 Y<br>
> 10690<br>
> Brick gandalf:/var/local/brick0/data 49155 0 Y<br>
> 18732<br>
> Brick azathoth:/var/local/brick0/<wbr>data 49155 0 Y<br>
> 9507<br>
> Brick yog-sothoth:/var/local/brick0/<wbr>data 49153 0 Y<br>
> 39559<br>
> Brick cthulhu:/var/local/brick0/data 49152 0 Y<br>
> 2682<br>
> Brick mordiggian:/var/local/brick0/<wbr>data 49152 0 Y<br>
> 39479<br>
> Self-heal Daemon on localhost N/A N/A Y<br>
> 9614<br>
> Self-heal Daemon on <a href="http://saruman.lub.lu.se" rel="noreferrer" target="_blank">saruman.lub.lu.se</a> N/A N/A Y<br>
> 15016<br>
> Self-heal Daemon on <a href="http://cthulhu.lub.lu.se" rel="noreferrer" target="_blank">cthulhu.lub.lu.se</a> N/A N/A Y<br>
> 9756<br>
> Self-heal Daemon on <a href="http://gandalf.lub.lu.se" rel="noreferrer" target="_blank">gandalf.lub.lu.se</a> N/A N/A Y<br>
> 5962<br>
> Self-heal Daemon on <a href="http://mordiggian.lub.lu.se" rel="noreferrer" target="_blank">mordiggian.lub.lu.se</a> N/A N/A Y<br>
> 8295<br>
> Self-heal Daemon on <a href="http://yog-sothoth.lub.lu.se" rel="noreferrer" target="_blank">yog-sothoth.lub.lu.se</a> N/A N/A Y<br>
> 7588<br>
><br>
> Task Status of Volume palantir<br>
> ------------------------------<wbr>------------------------------<wbr>------------------<br>
> Task : Rebalance<br>
> ID : c38e11fe-fe1b-464d-b9f5-<wbr>1398441cc229<br>
> Status : completed<br>
><br>
><br>
> --<br>
> Dave Sherohman<br>
> ______________________________<wbr>_________________<br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
<br>
<br>
--<br>
Dave Sherohman<br>
______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
</blockquote></div></div>