[Bugs] [Bug 1672318] "failed to fetch volume file" when trying to activate host in DC with glusterfs 3.12 domains

Thu Feb 6 13:47:28 UTC 2020

https://bugzilla.redhat.com/show_bug.cgi?id=1672318

Netbulae <info at netbulae.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |CLOSED
         Resolution|---                         |NOTABUG
        Last Closed|                            |2020-02-06 13:47:28

--- Comment #37 from Netbulae <info at netbulae.com> ---
I started a full audit and finally found the problem.

We use jumbo frames and the MTU was set to 1500

I think someone forgot to save the switch config of the storage switches and it
reverted after we did some failure testing after install.

So I increased the MTU to 9000 and the gluster domains connect fine now.

To remove the rdma error, I modified "/etc/glusterfs/glusterd.vol" and removed
rdma from "option transport-type socket,rdma"

And I set the tcp and ping timeouts to 10 

  gluster volume set <VOLUME> network.ping-timeout "10"
  gluster volume set <VOLUME> client.tcp-user-timeout 10
  gluster volume set <VOLUME> server.tcp-user-timeout 10

To get rid of the tcp-user-timeout warning:

[2020-01-30 13:26:06.552111] W [MSGID: 106061]
[glusterd-handler.c:3490:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout

The only thing I still worry a bit about is:

[2020-01-30 13:26:06.622061] I [glusterd.c:1999:init] 0-management:
Regenerating volfiles due to a max op-version mismatch or glusterd.upgrade file
not being present, op_version retrieved:0, max op_version: 60000

-- 
You are receiving this mail because:
You are on the CC list for the bug.