[Gluster-users] Transport endpoint is not connected : issue

Mon Sep 3 11:35:57 UTC 2018

On Mon, Sep 3, 2018 at 11:17 AM Karthik Subrahmanya <ksubrahm at redhat.com>
wrote:

> Hey,
>
> We need some more information to debug this.
> I think you missed to send the output of 'gluster volume info <volname>'.
> Can you also provide the bricks, shd and glfsheal logs as well?
> In the setup how many peers are present? You also mentioned that "one of
> the file servers have two processes for each of the volumes instead of one
> per volume", which process are you talking about here?
>
Also provide the "ps aux | grep gluster" output.

>
> Regards,
> Karthik
>
> On Sat, Sep 1, 2018 at 12:10 AM Johnson, Tim <tjj at uic.edu> wrote:
>
>> Thanks for the reply.
>>
>>
>>
>>    I have attached the gluster.log file from the host that it is
>> happening to at this time.
>>
>> It does change which host it does this on.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> *From: *Atin Mukherjee <amukherj at redhat.com>
>> *Date: *Friday, August 31, 2018 at 1:03 PM
>> *To: *"Johnson, Tim" <tjj at uic.edu>
>> *Cc: *Karthik Subrahmanya <ksubrahm at redhat.com>, Ravishankar N <
>> ravishankar at redhat.com>, "gluster-users at gluster.org" <
>> gluster-users at gluster.org>
>> *Subject: *Re: [Gluster-users] Transport endpoint is not connected :
>> issue
>>
>>
>>
>> Can you please pass all the gluster log files from the server where the
>> transport end point not connected error is reported? As restarting glusterd
>> didn’t solve this issue, I believe this isn’t a stale port problem but
>> something else. Also please provide the output of ‘gluster v info <volname>’
>>
>>
>>
>> (@cc Ravi, Karthik)
>>
>>
>>
>> On Fri, 31 Aug 2018 at 23:24, Johnson, Tim <tjj at uic.edu> wrote:
>>
>> Hello all,
>>
>>
>>
>>       We have a gluster replicate (with arbiter)  volumes that we are
>> getting “Transport endpoint is not connected” with on a rotating basis
>>  from each of the two file servers, and a third host that has the arbiter
>> bricks on.
>>
>> This is happening when trying to run a heal on all the volumes on the
>> gluster hosts   When I get the status of all the volumes all looks good.
>>
>>        This behavior seems to be a forshadowing of the gluster volumes
>> becoming unresponsive to our vm cluster.  As well as one of the file
>> servers have two processes for each of the volumes instead of one per
>> volume. Eventually the affected file server
>>
>> will drop off the listed peers. Restarting glusterd/glusterfsd on the
>> affected file server does not take care of the issue, we have to bring down
>> both file
>>
>> Servers due to the volumes not being seen by the vm cluster after the
>> errors start occurring. I had seen that there were bug reports about the
>> “Transport endpoint is not connected” on earlier versions of Gluster
>> however had thought that
>>
>> It had been addressed.
>>
>>      Dmesg did have some entries for “a possible syn flood on port *”
>> which we changed the  sysctl to “net.ipv4.tcp_max_syn_backlog = 2048” which
>> seemed to help the syn flood messages but not the underlying volume issues.
>>
>>     I have put the versions of all the Gluster packages installed below
>> as well as the   “Heal” and “Status” commands showing the volumes are
>>
>>
>>
>>        This has just started happening but cannot definitively say if
>> this started occurring after an update or not.
>>
>>
>>
>>
>>
>> Thanks for any assistance.
>>
>>
>>
>>
>>
>> Running Heal  :
>>
>>
>>
>> gluster volume heal ovirt_engine info
>>
>> Brick ****1.rrc.local:/bricks/brick0/ovirt_engine
>>
>> Status: Connected
>>
>> Number of entries: 0
>>
>>
>>
>> Brick ****3.rrc.local:/bricks/brick0/ovirt_engine
>>
>> Status: Transport endpoint is not connected
>>
>> Number of entries: -
>>
>>
>>
>> Brick *****3.rrc.local:/bricks/arb-brick/ovirt_engine
>>
>> Status: Transport endpoint is not connected
>>
>> Number of entries: -
>>
>>
>>
>>
>>
>> Running status :
>>
>>
>>
>> gluster volume status ovirt_engine
>>
>> Status of volume: ovirt_engine
>>
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>>
>> ------------------------------------------------------------------------------
>>
>> Brick*****.rrc.local:/bricks/brick0/ov
>>
>> irt_engine                                  49152     0          Y
>> 5521
>>
>> Brick fs2-tier3.rrc.local:/bricks/brick0/ov
>>
>> irt_engine                                  49152     0          Y
>> 6245
>>
>> Brick ****.rrc.local:/bricks/arb-b
>>
>> rick/ovirt_engine                           49152     0          Y
>> 3526
>>
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 5509
>>
>> Self-heal Daemon on ***.rrc.local     N/A       N/A        Y       6218
>>
>> Self-heal Daemon on ***.rrc.local       N/A       N/A        Y       3501
>>
>> Self-heal Daemon on ****.rrc.local N/A       N/A        Y       3657
>>
>> Self-heal Daemon on *****.rrc.local   N/A       N/A        Y       3753
>>
>> Self-heal Daemon on ****.rrc.local N/A       N/A        Y       17284
>>
>>
>>
>> Task Status of Volume ovirt_engine
>>
>>
>> ------------------------------------------------------------------------------
>>
>> There are no active volume tasks
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> /etc/glusterd.vol.   :
>>
>>
>>
>>
>>
>> volume management
>>
>>     type mgmt/glusterd
>>
>>     option working-directory /var/lib/glusterd
>>
>>     option transport-type socket,rdma
>>
>>     option transport.socket.keepalive-time 10
>>
>>     option transport.socket.keepalive-interval 2
>>
>>     option transport.socket.read-fail-log off
>>
>>     option ping-timeout 0
>>
>>     option event-threads 1
>>
>>     option rpc-auth-allow-insecure on
>>
>> #   option transport.address-family inet6
>>
>> #   option base-port 49152
>>
>> end-volume
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> rpm -qa |grep gluster
>>
>> glusterfs-3.12.13-1.el7.x86_64
>>
>> glusterfs-gnfs-3.12.13-1.el7.x86_64
>>
>> glusterfs-api-3.12.13-1.el7.x86_64
>>
>> glusterfs-cli-3.12.13-1.el7.x86_64
>>
>> glusterfs-client-xlators-3.12.13-1.el7.x86_64
>>
>> glusterfs-fuse-3.12.13-1.el7.x86_64
>>
>> centos-release-gluster312-1.0-2.el7.centos.noarch
>>
>> glusterfs-rdma-3.12.13-1.el7.x86_64
>>
>> glusterfs-libs-3.12.13-1.el7.x86_64
>>
>> glusterfs-server-3.12.13-1.el7.x86_64
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> --
>>
>> - Atin (atinm)
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180903/5a0d2a58/attachment.html>