January 13, 2012

DCB 101 - Priority-based Flow Control

DCB - Data Center Bridging is set of standard which defines 4 set of independent technologies/concepts to pretty much make Ethernet lossless, hence to support storage traffic. We will not go into a debate over FCoE, whether you should consider a single fabric for both storage and “standard/Ethernet” traffic in your data center design strategy or go a more traditional way.

As we said earlier, DCB is a set of standards, actually a set of 4 standards, which we will depict over the “DCB 101” posts series. DCB is compromised of:

  • Priority-based Flow Control - 802.1Qbb
  • Enhanced Transmission Selection - 802.1Qaz
  • Congestion Notification - 802.1Qau
  • DCB Exchange - - 802.1Qaz

While these standards are independent, they do correlate as dependent layers - remove one and the nice theory of maintaining a lossless transport fabric will start falling apart. Alright! I said no debate on FCoE.

Priority-base Flow Control

You may sometimes come across some technical documents which refer to PFC as Per-Priority Pause - for now, keep this in mind as it will shortly become clear as you read on.

Who says “Priority flow”, says “multiple flows/segments/divided lanes with different priorities” and at the end that’s naively what PFC is about. PFC’s goal is to merely segment traffic over the Ethernet fabric/medium and protocol into streamed marked priorities and define specific “guidance/action” for each of these streams. In other words, think of a highway which has a unique lane - PFC is virtually creating a secondary, third, fourth lane so that certain car types will be assigned to a specific lane - some cars can move faster, while others can be stuck in a traffic jam.

3bit Priority Code Point

Moving away from the highway simplistic illustration - the PCP value in the 802.1Q Tag Control Info 2bytes header is used by PFC to determine the priority of the traffic. It is important to keep in mind that PCP is not defined by DCB but by the 802.1p task committee; another great example is QoS which makes use of PCP for traffic classification.

Because PCP is only 3bits, we can only define 8 priorities. While the priority positive integer is ascendant in terms of higher priority, keep in mind that, a PCP value of 1 is lower than a PCP value of 0, this is simply due to the Network Priority translation value, such as PCP-0 = NP-1 and PCP-1 = NP-0. I will not detail why this is as it is now, but perhaps in a future post.

802.3x Pause Frames

We step away shortly from PCP and PFC to review some small concepts about Pause frames which will lay down the path to not only understand the importance of PCP but what PFC introduces to make Ethernet lossless.

Who talks about congestion, talks about the inability to not forward nor process traffic - whether that be frames or simply segments. Like on a highway, one way of getting rid of a traffic jam is to stop a specific flow of traffic or the whole traffic to dismantle the traffic jam. Pause Frames are like the highway patrol and yes they can also shoot you down ;-)… for more insights on this analogy, you can refer to my following posts:

Without going into too much details, a Pause Frame is simply a MAC Frame which carries an OPCODE of 0x0001 in its Mac Control field and a quanta based time unit field.

The opcode value of 0x0001 simply says that this MAC Frame is a Pause Frame and the quanta time field expressed as a 2bytes unsigned integer refers to how many bytes should the sending of frames be stopped. Calculating in clock-time, how long the sender will stop sending frames is relative to the transmission rate; For example, one quanta is 512bytes, on a 10Ge link, that means that one quanta is roughly 51.2ns (512/10⁻⁸).

While Pause frames are essential to prevent congestion (at least that’s the goal), 802.3x frames cannot differentiate priority based traffic, in other words, assuming both LAN and storage traffic are processed by the same frabric, a Pause frame issued due to the beginning of a congestion on the LAN traffic would result as well in an interruption of the storage traffic.

Per-Priority Pause

Because of the limitation of current Pause Frames, PFC defines a new Pause frame standard, which is a re-use of the existing pause frame standard but with priority capabilities. As with the 802.3x frame, the Per-Priority Pause carries the same OPCODE but at the difference of the following:

  • The OPCODE is 0x0101 to differentiate between 802.3x and PFC Pause frames
  • The 802.3x frame time field is removed and replaced by a 16bit vector array consisting of 2 fields, a “priority enable” field and a “time field”. The “priority enable” vector field refers whether the quanta time referred in the “time field” should be evaluated, it simply acts as an off/on switch.

Once PPP is negotiated, 802.3x frames are no longer validated on the ports received PPP frames, hence it is not possible to use 802.3x on top of PPP.

Summary

In summary, PFC makes use of PPC as defined in 802.1p, introduces a new pause frame format designed to reduces the “probability” of packets drop, treating defined priority traffic as separate queues. Having said that, PFC isn’t without its downfall - without a Congestion protocol such 802.1Qau, PFC will results in a head-of-line blocking scenario.

Finally it is important to note when using PFC, it is important to carefully calculate the buffer size to prevent frame lost on the receiving size while a PFC pause frame is sent - another post will soon clarify the methodology to use to calculate the buffer size.