Archive

Posts Tagged ‘tcp’

Ethernet flow control and IGMP snooping

September 23rd, 2009 Ali Abbas No comments

It is important to note that TCP flow control mechanism as well as Ethernet flow control mechanism are completely 2 different mechanism, which strive to achieve the same unique goal but when in used, are completely unaware of each other.

As a matter of fact, Ethernet flow control can fully alienate your network if not planned and used carefully :)

So What is TCP flow control?

Flow control is a mechanism implemented in the TCP stack which enables a receiver endpoint to notify a sender that it can no longer receive data in its buffer. The buffer size is what is simply referred as the TCP Window Size, and is transmitted in each ACK. The receiver can therefore let the sender know, how much bytes it is able to process at once.

[ let's assume, the receiver machine can only process 8K in its buffer]

(sender) <——– ACK 1022 WIN 4096 <——– (receiver)
(sender) ———> 4K | SEQ 1022 ————–> (receiver)

[ assuming that the buffer of the receiver is now full with the first 4K ]

(sender) <——– ACK 2024 WIN 0 <——– (receiver)

[ the sender is now "blocked" from sending more data till the receiver sends a second acknowledgment]

(sender) <——– ACK 2024 WIN 4096 <——– (receiver)

Ok so now, what is Ethernet flow control?

From layer 4 (TCP flow control), we jump now to layer 2 (Ethernet flow control).

Ethernet flow control is different from TCP flow control as it makes usage of the MAC control frame “pause frame” to notify the end device to stop sending frames. It is important to keep in mind that, the sender of the pause frame sets the 2bit quanta time which defines how long the endpoint must wait to start retransmitting frames and finally to keep in mind that pause frames are not forwarded. That is to say, a MAC control frame will not be forwarded through a trunk port, nor to the adjacent device.

What is the problem when using Ethernet flow control?

If you have read so far, you can start guessing what may occur, if you have “ethernet flow control” enabled on your switch. Instead of dropping the packets when the tcp window size is exhausted, the switch will not drop the packet but generate its own pause frame and send it to the sender host. Now keep in mind that pause frames completely cease all transmission on the data link layer… that is to say if meanwhile PCX was getting a file of PCB, it would as well be “paused”. Because pause frame only work on layer 2 “data-link”, all communications associated to the targeted switch port, will completely cease for the pause period of time.

But what happens meanwhile with the TCP flow control?

Like said earlier, the TCP flow control isn’t aware of the data flow control… the TCP flow control allows TCP to throttle the amount of data it is sending, because the switch no longer drops packets due to “ethernet flow control”, TCP becomes unaware that it is sending more data than what the endpoint window size can receive and thus keeps increasing the amount of data it is sending… the result is an overloaded receiver and a switch which keeps generating pause frames, till the TCP flow control detects congestion and readjusts the sending window.

And what happens when you have IGMP snooping off?

Imagine a multicast scenario, where you have a server and a workstation on 2x 1Gb port and another workstation on a 100Mb. If the server starts sending multicast packets at 1Gpbs (in the absolute ;-) ), Ethernet flow control will directly start to throttle down the speed at which the server sends the packet to the lowest port speed of the switch. Remember we are talking multicast here and because packets would be delivered to the 100Mb port… Ethernet flow control on the switch would force the server to only send at 100Mbps. While this is good in practice, remember without IGMP snooping,the switch would be sending all the multicast packets to all the switch ports, thus to endpoints which are unsolicited in the mutlicast group, will cause Ethernet flow control to trigger bad and slow performance.

Conclusion

IGMP snooping has always been a problem in VRRP setup (aka. Checkpoint HA), causing fluctuation on the interface state (referred as flapping interfaces).

While it is possible to disable IGMP per VLAN, I would recommend disabling IGMP snooping per MAC Multicast Address (i.e 01:50:5e:xx:yy:zz)

Categories: Networking, TCP/IP

Mathis Equation and TCP performance

September 16th, 2009 Ali Abbas 2 comments

As simple as possible laid off, the Mathis equation goes as follow

Rate <= (MSS/RTT)*(1 / p)

MSS

This is the Maximum Segment Size, which is the MTU excluding the TCP/IP headers.

MSS = MTU – TCP/IP headers – for example 1460 with an MTU of 1500 (20b IP and 20b TCP headers)

RTT

RTT is the Round Trip Time as measured by TCP. The round trip is the time it would take a packet to travel from endpoint A to B and from endpoint B to A.

On average, RTT = (Physical Distance * 20ms) / 1609 , that is to say, for each 1 609 km, you should expect an RTT of 20ms

p

p is the probability percentage of packet lost per physical segment. A fiber BER would typically be of 10⁻¹³%.

Before we go on, it is first important to understand how TCP evaluates packet loss. As simple as it can be, packet loss is simply based on late delivered ACKs. The more acknowledgment are being sent late, the more the % of packet lost increases.

Let’s get more serious

As explained earlier, the Mantis Equation allows to locate the rate or so to say throughout we can use based on the MSS, RTT and the probability % of packet loss on the link.

Imagine we have an E3 link. For those new to WAN technology, an E3 link uses an M3 signaling type as opposed to an E1 which uses a ZM signaling type. Getting back to the speed line, an E3 is the equivalent of  16*E1 ~= 34.064 Mbps (including management overhead)

1. Line is E3 with a bw of 34.064 Mbps
2. Our endpoint is roughly 3000 km from us
3. We are using a default MSS of 1460
4. An E3 would have a typical packet loss percentage of 10⁻⁶ = 0.001 % (1 packet lost each 1000 packets)

Based on 3000 km, we could assume that the average RTT would be of 37.29 ms = 0.03729 s

Mantis Eq : (1460 / 0.03729) * (1/0.001) ~=  1.23 Mbps

Now if we had no packet loss, our throughout would have been

Throughput = TCPWindow / RTT

(65535 / 0.03729) * 8 ~= 14Mbits

An original bandwidth line of 14 Mbps and an actual throughput of 1.23Mbps over 3000km with a packet lost of one packet each 1000.

How to do you increase rate?

In a perfect world, you would of course need to reduce each value variable of the equation such as decreasing RTT, decreasing the loss probability and increase the MMS (which btw you cannot on the internet, as all routers are configured with a static MTU of 1500)

I hope that was informative on how packet loss can affect throughput.

Reference

The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm (1997) http://citeseer.ist.psu.edu/old/mathis97macroscopic.html

Categories: Networking, TCP/IP

Denial of Service – Sockstress

October 8th, 2008 Ali Abbas No comments

Sock Stress is a new type of Denial of Service which was developed by Jack C. Louis. According to nmap creator Fyodor, the attacker sends a TCP SYN packet to a targeted port, but first by making sure that a firewall protects his own machine as to prevent it to interfere with the attack process. The main reason for the protection is as to avoid the attacker’s computer to reset the unexpected returned SYN/ACK packet (2nd step of the TCP 3 way handshake). This is obvious since the attacker sent the SYN packet from userland and not the operating system’s API. According the Fyodor, the attacker’s pc from userland will therefore reply to each packet by sending another raw packet. That packet is therefore the acknowledgment packet.

That attempt to explain it was partially denied by Robert Lee as being the overall “methodology”, however has refused to comment further more on it. As far as it is being said, no current fix or system is known to be able to prevent Sock stress to take down a tcp stack server.

(for further info: http://blog.robertlee.name/2008/09/sockstress-podcast-interview.html)

Categories: Networking, Security