September 23, 2009

Ethernet flow control and IGMP snooping

It is important to note that** TCP flow control mechanism** as well as Ethernet flow control mechanism are completely 2 different mechanism, which strive to achieve the same unique goal but when in used, are completely unaware of each other.

As a matter of fact, Ethernet flow control can fully alienate your network if not planned and used carefully :)…

So What is TCP flow control?

Flow control is a mechanism implemented in the TCP stack which enables a receiver endpoint to notify a sender that it can no longer receive data in its buffer. The buffer size is what is simply referred as the TCP Window Size, and is transmitted in each ACK. The receiver can therefore let the sender know, how much bytes it is able to process at once.

[ let’s assume, the receiver machine can only process 8K in its buffer]

(sender) <——– ACK 1022 WIN 4096 <——– (receiver) (sender) ———> 4K | SEQ 1022 ————–> (receiver)

[ assuming that the buffer of the receiver is now full with the first 4K ]

(sender) <——– ACK 2024 WIN 0 <——– (receiver)

[ the sender is now “blocked” from sending more data till the receiver sends a second acknowledgment]

(sender) <——– ACK 2024 WIN 4096 <——– (receiver)

Ok so now, what is Ethernet flow control?

From layer 4 (TCP flow control), we jump now to layer 2 (Ethernet flow control).

Ethernet flow control is different from TCP flow control as it makes usage of the MAC control frame “pause frame” to notify the end device to stop sending frames. It is important to keep in mind that, the sender of the pause frame sets the 2bit quanta time which defines how long the endpoint must wait to start retransmitting frames and finally to keep in mind that pause frames are not forwarded. That is to say, a MAC control frame will not be forwarded through a trunk port, nor to the adjacent device.

What is the problem when using Ethernet flow control?

If you have read so far, you can start guessing what may occur, if you have “ethernet flow control” enabled on your switch. Instead of dropping the packets when the tcp window size is exhausted, the switch will not drop the packet but generate its own pause frame and send it to the sender host. Now keep in mind that pause frames completely cease all transmission on the data link layer… that is to say if meanwhile PCX was getting a file of PCB, it would as well be “paused”. Because pause frame only work on layer 2 “data-link”, all communications associated to the targeted switch port, will completely cease for the pause period of time.

But what happens meanwhile with the TCP flow control?

Like said earlier, the TCP flow control isn’t aware of the data flow control… the TCP flow control allows TCP to throttle the amount of data it is sending, because the switch no longer drops packets due to “ethernet flow control”, TCP becomes unaware that it is sending more data than what the endpoint window size can receive and thus keeps increasing the amount of data it is sending… the result is an overloaded receiver and a switch which keeps generating pause frames, till the TCP flow control detects congestion and readjusts the sending window.

And what happens when you have IGMP snooping off?

Imagine a multicast scenario, where you have a server and a workstation on 2x 1Gb port and another workstation on a 100Mb. If the server starts sending multicast packets at 1Gpbs (in the absolute ;-) ), Ethernet flow control will directly start to throttle down the speed at which the server sends the packet to the lowest port speed of the switch. Remember we are talking multicast here and because packets would be delivered to the 100Mb port… Ethernet flow control on the switch would force the server to only send at 100Mbps. While this is good in practice, remember without IGMP snooping,the switch would be sending all the multicast packets to all the switch ports, thus to endpoints which are unsolicited in the mutlicast group, will cause Ethernet flow control to trigger bad and slow performance.

Conclusion

IGMP snooping has always been a problem in VRRP setup (aka. Checkpoint HA), causing fluctuation on the interface state (referred as flapping interfaces).

While it is possible to disable IGMP per VLAN, I would recommend disabling IGMP snooping per MAC Multicast Address (i.e 01:50:5e:xx:yy:zz)