[Gluster-users] Possible SYN flooding

Wed Apr 16 23:53:49 UTC 2014

There are no tx/rx errors but

     dropped_link_overflow: 10046509
     dropped_link_error_or_filtered: 72353

This is of some concern, but wouldn't be sure what really happened.
Are you using myricom 10gig interfaces?

https://www.myricom.com/software/myri10ge/397-could-you-explain-the-meanings-of-the-myri10ge-counters-reported-in-the-output-of-ethtool.html

=================
dropped_link_overflow

The number of received packets dropped due to lack of receive
(on-chip) buffer space. This will happen if:

our driver/firmware is not consuming fast enough, and
the flow-control is off, or
the flow-control is on, so we are sending pause frames, but the other
side does not obey them.

Verify that ethernet flow control is enabled on the 10GbE switch to
which the adapter is connected.

If the application's traffic is bursty, have you tried the load-time
option myri10ge_big_rxring=1? Please read: Would you explain the
Myri10GE load-time option myri10ge_big_rxring?
=================

=================
dropped_link_error_or_filtered

The number of received packets that are not received into the receive
buffer because they are malformed, they are PAUSE frames used for
Ethernet flow control, they are not destined for the adapter (i.e. the
packet's destination MAC address does not match the adapter's MAC
address), or their destination MAC addresses are of the form
01:80:C2:00:00:0X (reserved addresses).

If this counter keeps increasing when there is no traffic, then the
increase is likely due to BPDU. If it only increases during a stress
test (achieving close to line rate), then the increase is likely due
to PAUSE. The counter also includes malformed frames due to CRC or
whatever. Also refer to How do I check for badcrcs when running the
Myri10GE software? for further details.
=================

May be contacting your Myricom vendors would be a right start?

On Wed, Apr 16, 2014 at 4:15 PM, Franco Broi <franco.broi at iongeo.com> wrote:
> What should I be looking for? See below.
>
> I thought that maybe it coincided with a bunch of machines waking from
> sleep, but I don't think that is the case.
>
> [root at nas1 ~]# ethtool -S eth2
> NIC statistics:
>      rx_packets: 116095907410
>      tx_packets: 83692116889
>      rx_bytes: 141224428783450
>      tx_bytes: 1007756860391628
>      rx_errors: 0
>      tx_errors: 0
>      rx_dropped: 0
>      tx_dropped: 0
>      multicast: 0
>      collisions: 0
>      rx_length_errors: 0
>      rx_over_errors: 0
>      rx_crc_errors: 0
>      rx_frame_errors: 0
>      rx_fifo_errors: 0
>      rx_missed_errors: 0
>      tx_aborted_errors: 0
>      tx_carrier_errors: 0
>      tx_fifo_errors: 0
>      tx_heartbeat_errors: 0
>      tx_window_errors: 0
>      tx_boundary: 4096
>      WC: 1
>      irq: 134
>      MSI: 1
>      MSIX: 0
>      read_dma_bw_MBs: 1735
>      write_dma_bw_MBs: 1715
>      read_write_dma_bw_MBs: 3421
>      serial_number: 446488
>      watchdog_resets: 0
>      dca_capable_firmware: 1
>      dca_device_present: 1
>      link_changes: 2
>      link_up: 1
>      dropped_link_overflow: 10046509
>      dropped_link_error_or_filtered: 72353
>      dropped_pause: 0
>      dropped_bad_phy: 0
>      dropped_bad_crc32: 0
>      dropped_unicast_filtered: 72353
>      dropped_multicast_filtered: 24551326
>      dropped_runt: 0
>      dropped_overrun: 0
>      dropped_no_small_buffer: 0
>      dropped_no_big_buffer: 0
>      ----------- slice ---------: 0
>      tx_pkt_start: 2087737864
>      tx_pkt_done: 2087737864
>      tx_req: 2508370636
>      tx_done: 2508370636
>      rx_small_cnt: 1504058385
>      rx_big_cnt: 2957794484
>      wake_queue: 462814
>      stop_queue: 462814
>      tx_linearized: 1011916
>
>
> On Wed, 2014-04-16 at 11:38 -0700, Harshavardhana wrote:
>> Perhaps a driver bug? - have you verified ethtool -S output?
>>
>> On Wed, Apr 16, 2014 at 2:42 AM, Franco Broi <franco.broi at iongeo.com> wrote:
>> >
>> > I've increased my tcp_max_syn_backlog to 4096 in the hope it will
>> > prevent it from happening again but I'm not sure what caused it in the
>> > first place.
>> >
>> > On Wed, 2014-04-16 at 17:25 +0800, Franco Broi wrote:
>> >> Anyone seen this problem?
>> >>
>> >> server
>> >>
>> >> Apr 16 14:34:28 nas1 kernel: [7506182.154332] TCP: TCP: Possible SYN flooding on port 49156. Sending cookies.  Check SNMP counters.
>> >> Apr 16 14:34:31 nas1 kernel: [7506185.142589] TCP: TCP: Possible SYN flooding on port 49157. Sending cookies.  Check SNMP counters.
>> >> Apr 16 14:34:53 nas1 kernel: [7506207.126193] TCP: TCP: Possible SYN flooding on port 49159. Sending cookies.  Check SNMP counters.
>> >>
>> >> client
>> >>
>> >> Apr 16 14:34:21 charlie5 GlusterFS[6718]: [2014-04-16 06:34:21.710137] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-data-client-4: server 192.168.35.107:49157 has not responded in the last 42 seconds, disconnecting.
>> >> Apr 16 14:34:31 charlie5 GlusterFS[6718]: [2014-04-16 06:34:31.711605] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-data-client-2: server 192.168.35.107:49156 has not responded in the last 42 seconds, disconnecting.
>> >> Apr 16 14:35:13 charlie5 GlusterFS[6718]: [2014-04-16 06:35:13.758227] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-data-client-0: server 192.168.35.107:49159 has not responded in the last 42 seconds, disconnecting.
>> >>
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> >
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>
>

-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes