Category: Networking

If you have been following the latest improvement in the release of the 2.6.35 kernel, you have probably noted 2 major network stack improvement which have create some buzz in the geek community. Ok not really! since the RPS and RFS project has been going on since a while now by the crowd @Google Inc.

Since those 2 amaizing features have been fully implemented in the kernel and are now supported, I though it is a good opportunity to finally get down to the beast and try to shed some light on RPS and RFS.

First of all… RPS stands for “Receive Packet Steering” and RFS for “Receive Flow Steering”… As you can note, both of them deal with the incoming traffic in the network stack, as this is where the big deadlock takes place when the packets are handled by the kernel and processed. Since most of the network cards out there are single queue NIC, the packets are handled by the kernel through a single queue and the kernel must try to spread the load of packet processing over all CPU cores.

Mono-queue Network Cards

As I said earlier, RPS and RFS addresses the limitation of mono-queue network cards (most cards used in the DataCenters or in the Coorporate IT environment are likely to be mono-queue cards).

With a mono-queue network card, each incoming packets is spread across the CPU cores.. in a nutshell, one packet is handled by core 1 while one packet from the same TCP/UDP stream is handled by core 2 and subsequently the CPU cores must share their cache or query them in order of interest… which hits the overall network performance (this is referred as cacheline bouncing).

They are many cards out there (I like to refer them as exotic NICs) that implement a multi-queue, which in all is what RPS mostly tries to emulate.

RPS – Receive Packet Steering

So following what I said earlier… RPS basically is an emulation of a multi-queue card. In a nutshell, what it does is calculate a hash based on the header of the incoming IP packet, by identifying the IP and Port and assigne this hash to one CPU core. Once the hash is calculated, it is used to transfer all new incoming packets that matches this hash to the same CPU core – a bit like a session sticky in Load balancing terms.

So now all packets of one TCP stream/connection will be handled by one CPU core, and thus avoiding the performance hits created by the cacheline bouncing effect.

You can specify per network card, the number of CPUs to use, this can be found in

/sys/class/net/ethX/rps_cpus

You may wonder if RPS always calculte the hash of an incoming packet… in most cases, if you mono-queue card is able to do so, RPS will simply fetch this information from the card and will not carry on the load to performance the alogrithmic calculation.

RFS – Receive Flow Steering

While RPS obviously offers a huge performance gain, RFS has been introduced to help userland applications process faster by improving CPU locality between the application and the packets handled to it by the kernel. In other word, if an application issues some system calls that triggers packets to be sent and received, its footprint will be logged to the CPU currently executing it and incoming packets targetted to this application will be handed over to this CPU by RPS.

So you can see, RFS is just a sort of addons to RPS but instead of doing an IP/Port match, it is doing an Application match to minimize the impact of CPU locality performance penalty.

In a small summary, this is basically what RPS and RFS does. I will try in another post to get more technical and offer an analysis of the new code changes in the kernel network stack such as an overview of the rps_sock_flow_table, the rps_dev_flow_table and the rxhash variable of the stack – and mostly how out of ordered packets are handled by this new system ;-)

One of the most interesting feature of DMVPN as far as my personal opinion goes is its extended support for VRF on MPLS networks.

Remember, VRF allows multiple instance of routing tables to co-exist on the same router at the same time.

Having said that, DMVPN helps scalling out tradional IPSEC hub-and-spoke VPN configuration by setting permanent and temporary connections, respectively from the spoke routers to the hub router and between the spoker routers as needed. That has for result to aleviate traffic from the hub router and therefore providing Netowrk Performance, Scalability and better Traffic control management.

Having said that, DMVPN relies on the following protocols

- IPSEC: pre-shared keys used to secure the traffic

- mGRE: mGRE allows us to encapsulate multicast packets (i.e OSPF packets) and to setup a speudo-virtual tunnel interface to link our sites

- NHRP: Without NHRP, our GRE tunnel cannot be established. NHRP stands for “Next Hop Resolution Protocol” and allows our server to know what the peer sites IPs are. The NHRP server (HUB) will be answering NHRP request for IP discovery of peers to form tunnels.

- A routing protocol: OSPF, RIP, BGP etc…

Important things to keep in mind

- IPSEC in “Transport Mode”

When setting the tunnel, make sure to use “transport” mode with IPSEC, since the encapsulation of the IP packet in an ESP header is done already with GRE. This allows you to save 20 bytes on the MTU ;-)

– Use RIP for default routes

I know, you are probably ready to pull out your hair, but in a large DMVPN network, using RIP could help scale out than another routing protocol such as OSPF… calculating adjacencies are CPU intensive ;-)

Catalyst 6500 and ASIC issues

Referral news can be found at http://www.networkworld.com/community/blog/asic-issues-delaying-cisco-switch

Now keep in mind, I have not read the bulletin published by Rodman & Renshaw, LLC – nor can attest this is the fundamental reasons why the switches have been delayed. As for the lifespan of the Cat 6500 to be fully replaced by the Nexus 7000, remember that Cisco’s Supervisor Engines for Modular Switches have a lifespan of 10 to 12 years, that being said a new 720 Supervisor Engine was just released roughly 1 year and a half ago – you make the math now ;-)

Cheers,

Ali

Cisco’s IOS Quiet Period refers to the period in which telnet/ssh/http access are disabled for an X amount of time after an Y amount of failed attempt.

While it is quite unusual to have router virtual access allowed from the WAN link, it may not hurt to go further by enabling this cisco feature to prevent a potential DOS dictionary attack from the WAN link or possibly as well from the LAN link.

The command used to enable the “Quiet Period” is “login block-for” in Global Configuration mode.

edge(config)#login block-for 600 attempts 5 within 2

In other words, block virtual login for 10mn (600 seconds) after 5 attempts within 2 seconds

Further Options

While this command should be enough to get us where we want to be, it is important to consider the following

1. Log failed login attempts

edge(config)# login on-failure log

You can view the login logs by issuing “show login failures

2. Prevent administrative hosts to be locked out during the Quiet Period

login quiet-mode access-class {acl-name |acl-number}

edge(config)#login quiet-mode access class adminIPs

By defining an access list named adminIPs that possibly contains a range of IPs representing administrative hosts, we can avoid having ourselves be subject to the “Quiet Period” while in action.

I hope that was informative,

Cheers,

Ali

IPv6 support on alouche.net

Hello,

This is just to announce that the blog is now available through IPv6. To be more precise through proto41 as this is just an experiment.

[aabbas@mig ~]$ host alouche.net
alouche.net has address 69.72.186.60

alouche.net has IPv6 address 2001:470:1f07:a4e::2

[root@srv1 ~]#ip -6 route sh
xxxx:xxx:xxxx:xxxx::/64 via :: dev t-ipv6  proto kernel  metric 256  mtu 1480 advmss 1420 hoplimit 4294967295
2001:470:1f07:a4e::/64 dev eth0  proto kernel  metric 256  mtu 1500 advmss 1440 hoplimit 4294967295
fe80::/64 dev eth0  proto kernel  metric 256  mtu 1500 advmss 1440 hoplimit 4294967295
fe80::/64 via :: dev t-ipv6  proto kernel  metric 256  mtu 1480 advmss 1420 hoplimit 4294967295
ff00::/8 dev eth0  metric 256  mtu 1500 advmss 1440 hoplimit 4294967295
ff00::/8 dev t-ipv6  metric 256  mtu 1480 advmss 1420 hoplimit 4294967295
default dev t-ipv6  metric 1024  mtu 1480 advmss 1420 hoplimit 4294967295
(tunnel IP has been obfuscated)

Nginx (the webserver serving this blog) has been recompiled to support IPv6 and is now serving requests for this domain on both IPv4 and IPv6.

Cheers,
Ali