[Bugs] [Bug 1389947] Upgrading to 3.7.16-1 breaks fuse mounts

Wed Nov 9 05:30:42 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1389947

--- Comment #9 from Kaushal <kaushal at redhat.com> ---
Jules,

Thanks for sharing the logs. (Jules shared the logs privately with me).

I did go through the logs, but I cannot find any specific issue with Gluster.

As I mentioned in the previous comment, I see a lot of no route to host errors
in the client logs. Along with these, I don't see any logs in the logs in the
clients about changing ports, which should be present after a successful mount.

To make it clearer the following happens on clients when you mount.

1. First the volfile is fetched from the address you give in the mount command.
The client connects to GlusterD's port (24007) at the address and gets the
volfile for the volume.
2. The client then builds and loads the graph, and dumps the volfile in the
logs.
3. After this the graph is activated. This is when the client connects to the
bricks. This connection happens in 2 steps.
3.a. For each brick the client first connects to GlusterD (24007) on the bricks
host and gets the port for the brick. This is known as the portmap query.
3.b. With the ports obtained, the client reconnects to the actual bricks with
the new ports. This gives a 'changing port to N (from M)' log entry.
4. The client mount point is now ready to serve requests.

In the logs, I don't see successful occurances of 3.a and 3.b.

There are multiple error logs of 'no route to host' for addresses
10.1.20.1:24007 and 10.1.20.2:24007 from the client translators. This means
that portmap couldn't succeed, which the presence of 'failed to get port
number' logs also support.

With this in mind, I can arrive at 2 possible causes,

1. There was an issue with the network at this point in time.
2. GlusterD on the servers had an issue.

1 can be mostly discarded because the clients were able to fetch volfiles from
the servers. Although possible, it is highly improbable that the network had
and issue between the time of fetching volfiles and doing portmap queries. And
it happening again, and again also doesn't help this cause.

This leaves glusterd. GlusterD can sometimes enter into deadlocks, preventing
it from accepting new connections or serving new requests. If glusterd is
deadlocked, further mount attempts would also not succeed (unless glusterd was
restarted in between).

So to investigate this further, could you share the glusterd logs from the
servers. These are also in /var/log/glusterfs named as 'etc*-glusterd.log'. If
this cannot help figure out what went wrong, then this might have been
something very special that happened once.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=6TKnyCbRjr&a=cc_unsubscribe