[Gluster-users] Gluster and bonding

Fri Mar 22 16:42:31 UTC 2019

Hi all,

I had the opportunity to test the setup on actual hardware, as I managed to
arrange for a downtime at customer.

The results were that, when cables were split between two switches, even
though servers were able to ping each other, gluster was not able to start
the volumes and the only relevant log I noticed was:

[2019-03-21 14:16:15.043714] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: *Staging failed* on
gluster2. Please check log file for details.
[2019-03-21 14:16:15.044034] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on
gluster2. Please check log file for details.
[2019-03-21 14:16:15.044292] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on
gluster2. Please check log file for details.
[2019-03-21 14:49:11.278724] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on
gluster2. Please check log file for details.
[2019-03-21 14:49:40.904596] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on
gluster1. Please check log file for details.

Does anyone has any idea what does this staging error mean?
I don't have the hardware anymore available for testing and I will try to
reproduce on virtual env.

Thanx
Alex

On Mon, Mar 18, 2019 at 12:52 PM Alex K <rightkicktech at gmail.com> wrote:

> Performed some tests simulating the setup on OVS.
> When using mode 6 I had mixed results for both scenarios (see below):
>
> [image: image.png]
>
> There were times that hosts were not able to reach each other (simple ping
> tests) and other time where hosts were able to reach each other with ping
> but gluster volumes were down due to connectivity issues being reported
> (endpoint is not connected). systemctl restart network usually resolved the
> gluster connectivity issue. This was regardless of the scenario (interlink
> or not). I will need to do some more tests.
>
> On Tue, Feb 26, 2019 at 4:14 PM Alex K <rightkicktech at gmail.com> wrote:
>
>>
>> Thank you to all for your suggestions.
>>
>> I came here since only gluster was having issues to start. Ping and other
>> networking services were showing everything fine, so I guess there is sth
>> at gluster that does not like what I tried to do.
>> Unfortunately I have this system in production and I cannot experiment.
>> It was a customer request to add redundancy to the switch and I went with
>> what I assumed would work.
>> I guess I have to have the switches stacked, but the current ones do not
>> support this. They are just simple managed switches.
>>
>> Multiple IPs per peers could be a solution.
>> I will search a little more and in case I have sth I will get back.
>>
>> On Tue, Feb 26, 2019 at 6:52 AM Strahil <hunter86_bg at yahoo.com> wrote:
>>
>>> Hi Alex,
>>>
>>> As per the following ( ttps://
>>> community.cisco.com/t5/switching/lacp-load-balancing-in-2-switches-part-of-3750-stack-switch/td-p/2268111
>>> ) your switches need to be stacked in order to support lacp with your setup.
>>> Yet, I'm not sure if balance-alb will work with 2 separate switches -
>>> maybe some special configuration is needed ?!?
>>> As far as I know gluster can have multiple IPs matched to a single peer,
>>> but I'm not sure if having 2 separate networks will be used as
>>> active-backup or active-active.
>>>
>>> Someone more experienced should jump in.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>> On Feb 25, 2019 12:43, Alex K <rightkicktech at gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> I was asking if it is possible to have the two separate cables connected
>>> to two different physical switched. When trying mode6 or mode1 in this
>>> setup gluster was refusing to start the volumes, giving me "transport
>>> endpoint is not connected".
>>>
>>> server1: cable1 ---------------- switch1 --------------------- server2:
>>> cable1
>>>                                             |
>>> server1: cable2 ---------------- switch2 --------------------- server2:
>>> cable2
>>>
>>> Both switches are connected with each other also. This is done to
>>> achieve redundancy for the switches.
>>> When disconnecting cable2 from both servers, then gluster was happy.
>>> What could be the problem?
>>>
>>> Thanx,
>>> Alex
>>>
>>>
>>> On Mon, Feb 25, 2019 at 11:32 AM Jorick Astrego <jorick at netbulae.eu>
>>> wrote:
>>>
>>> Hi,
>>>
>>> We use bonding mode 6 (balance-alb) for GlusterFS traffic
>>>
>>>
>>> <https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/network4>
>>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/network4
>>>
>>> Preferred bonding mode for Red Hat Gluster Storage client is mode 6
>>> (balance-alb), this allows client to transmit writes in parallel on
>>> separate NICs much of the time.
>>>
>>> Regards,
>>>
>>> Jorick Astrego
>>> On 2/25/19 5:41 AM, Dmitry Melekhov wrote:
>>>
>>> 23.02.2019 19:54, Alex K пишет:
>>>
>>> Hi all,
>>>
>>> I have a replica 3 setup where each server was configured with a dual
>>> interfaces in mode 6 bonding. All cables were connected to one common
>>> network switch.
>>>
>>> To add redundancy to the switch, and avoid being a single point of
>>> failure, I connected each second cable of each server to a second switch.
>>> This turned out to not function as gluster was refusing to start the volume
>>> logging "transport endpoint is disconnected" although all nodes were able
>>> to reach each other (ping) in the storage network. I switched the mode to
>>> mode 1 (active/passive) and initially it worked but following a reboot of
>>> all cluster same issue appeared. Gluster is not starting the volumes.
>>>
>>> Isn't active/passive supposed to work like that? Can one have such
>>> redundant network setup or are there any other recommended approaches?
>>>
>>>
>>> Yes, we use lacp, I guess this is mode 4 ( we use teamd ), it is, no
>>> doubt, best way.
>>>
>>>
>>> Thanx,
>>> Alex
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.org <https://lists.gluster.org/mailman/listinfo/gluster-users>https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190322/237c23ef/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 31291 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190322/237c23ef/attachment.png>