Cisco Switch Ether-Channel – misconceptions

I decided to write this post to address some of the many misconceptions out there on many mailing lists, forums etc…

1. More bandwidth

That is one of the first common misconception I often read or hear, if I “trunk” (Solaris term actually) many ports together, the number of ether-channeled port is equal of the number of ports multiplied by the bandwidth of each port.

In other words… 8 * 100Mbps port ether-channeled would create a single logical port with 800Mbps… Now while this is somewhat the anticipation, it isn’t really true. You see, only one physical link would be used for a single connection. If I connect to a workstation over the ether-channel, my packets for this single connection would be using one link and not multiple link, thus the bandwidth for this “conversation” would still be of 100Mbps.

2. All links are load balanced

It is also common to think that all links are equally shared, that is  say, one link would not be overloaded as opposed to the other ones.

Now while this is somewhat achievable, it isn’t by default. You see by default Cisco’s ether-channel algorithm hashes the destination MAC address of each packet and assigns it a number ranging from 0 to 7… This value is then assigned to a port belonging to the logical link. Because the maximum number of active links¹ which can be channeled together is 8¹, each of them can only carry one value (the hashed number from the destination mac address + session). If you happen for example to have 5 links, 3 links would be assigned 2 hashed value, while the 2 left, 1 hashed value. If you only had 2 links, then each would be assigned 4 values.

Because Cisco’s default algorithm uses the destination Mac Address, each packets destined to one server would be using the same link.

Imagine this simple scenario

[many workstations] —> switchA <==========> switchB —-> Server1

When PC1, PC2, PC3 which connects to switchA try to access a file on Server1, all connections would be going through the same ether-channel link, while the other links (assuming there is no other traffic) would be completely unused. Now the returned packets from Server1 to PC1, PC2, PC3, would nevertheless be using different links, because we would be having 3 Destination Mac address. So one way, a link would be overloaded, the other way, the link wouldn’t.

Because Ether-channeling is a one way work… it is possible to set the load balancing algorithm on switchA from Destination Mac Address to Source Mac Address, while leaving switchB on Destination Mac Address, which would then result in a “somewhat” load balanced ether-channel links.

Having said that, Cisco Algorithm uses about 9 methods to determine the link to use, those are the Destination/Source Mac Address, the Destination/Source IP Address, the Source/Destination Port, the Destination AND Source IP Address, the Source AND Destination Port and finally the Destination AND Source Mac Address.

(wow that was a long sentence :) )

Depending on which methods you decide to use, there would be some type of drawbacks depending on your network topology. That is to say, Ether-Channel isn’t the prior mean to resolve bandwidth/throughput issue; it is nevertheless useful and when necessary is a big important feature in network topology design.

¹ [ LACP allows a maxium of 16 links with only 8 active ]

iBGP route reflectors

It is by default that all BGP peers within the same autonomous systems must peer with each other to form a full mesh in order for each peer to be able to advertise routes to its adjacent peer.

“Disclaimer: BGP confederacies will not be tackled in this post”

For example

if routerB learns a new route from routerA, it wouldn’t be able to advertise the learned route to routerC and routerC would only be able to learn the route from routerA. Now imagine if your network isn’t fully meshed? well I am sure you guessed right! depending on your network infrastructure, routing on this advertised subnet from router A will be un-reachable through routerC and you will be having a big problem of convergence.

What if the peers cannot be meshed together?

It is possible when using standard iBGP to “force” a router to “reflect” the routes it learned to another adjacent peer. In simple words, routerB learned the route from routerA, routerB “reflects” that route to routerC.

Route Reflectors

As stated already, route reflectors eliminate the need for a full mesh setup, thus allow scalability but also a route reflector reduce data exchange between peers by only reflecting the best path. When setting up RR, you would be defining what is generally referred as a cluster (RR + client peers), in our example below, the RR is routerB and the client peers are routerA and routerC. This group is then defined as a cluster.

It is also important to understand how RR works.

As we said earlier, RR selects the best path when receiving a route from an iBGP peer; if the route had originated from a non-client iBGP peer (imagine routerD connected to routerA), this route will then be only reflected to all route reflectors clients (routerA and routerC for example), thus any other none-rr-clients needs to be fully meshed.  If the route nevertheless originates from either routerA or routerC, the route would then be reflected to both non-client and rr-client .

Now let’s see how we can set up a simple route reflector…

Configuration using a private ASN

Simple topology: [routerA 192.168.1.1] —– [ routerB 192.168.2.1] —– [ routerC 192.168.3.1]

routerB(config)#router bgp 64514
routerB(config-router)#neighbor 192.168.1.1 remote-as 64514
routerB(config-router)#neighbor 192.168.1.1 route-reflector-client
routerB(config-router)#neighbor 192.168.3.1 remote-as 64514
routerB(config-router)#neighbor 192.168.3.1 route-reflector-client

Now routerB will be advertising routers learned from routerA to routerC and from routerC to routerA.

What more?

I am not going to reiterate what RFC 2796 addresses, thus I suggest a read at http://www.ietf.org/rfc/rfc2796.txt to learn more about RR loop detection and avoidance.

Mathis Equation and TCP performance

As simple as possible laid off, the Mathis equation goes as follow

Rate <= (MSS/RTT)*(1 / p)

MSS

This is the Maximum Segment Size, which is the MTU excluding the TCP/IP headers.

MSS = MTU – TCP/IP headers – for example 1460 with an MTU of 1500 (20b IP and 20b TCP headers)

RTT

RTT is the Round Trip Time as measured by TCP. The round trip is the time it would take a packet to travel from endpoint A to B and from endpoint B to A.

On average, RTT = (Physical Distance * 20ms) / 1609 , that is to say, for each 1 609 km, you should expect an RTT of 20ms

p

p is the probability percentage of packet lost per physical segment. A fiber BER would typically be of 10⁻¹³%.

Before we go on, it is first important to understand how TCP evaluates packet loss. As simple as it can be, packet loss is simply based on late delivered ACKs. The more acknowledgment are being sent late, the more the % of packet lost increases.

Let’s get more serious

As explained earlier, the Mantis Equation allows to locate the rate or so to say throughout we can use based on the MSS, RTT and the probability % of packet loss on the link.

Imagine we have an E3 link. For those new to WAN technology, an E3 link uses an M3 signaling type as opposed to an E1 which uses a ZM signaling type. Getting back to the speed line, an E3 is the equivalent of  16*E1 ~= 34.064 Mbps (including management overhead)

1. Line is E3 with a bw of 34.064 Mbps
2. Our endpoint is roughly 3000 km from us
3. We are using a default MSS of 1460
4. An E3 would have a typical packet loss percentage of 10⁻⁶ = 0.001 % (1 packet lost each 1000 packets)

Based on 3000 km, we could assume that the average RTT would be of 37.29 ms = 0.03729 s

Mantis Eq : (1460 / 0.03729) * (1/0.001) ~=  1.23 Mbps

Now if we had no packet loss, our throughout would have been

Throughput = TCPWindow / RTT

(65535 / 0.01864) * 8 ~= 28Mbits

An original bandwidth line of 28 Mbps and an actual throughput of 1.23Mbps over 3000km with a packet lost of one packet each 1000.

How to do you increase rate?

In a perfect world, you would of course need to reduce each value variable of the equation such as decreasing RTT, decreasing the loss probability and increase the MMS (which btw you cannot on the internet, as all routers are configured with a static MTU of 1500)

I hope that was informative on how packet loss can affect throughput.

Reference

The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm (1997) http://citeseer.ist.psu.edu/old/mathis97macroscopic.html

Cisco IOS Configuration boot register – ROMMON

Every cisco routers has a configuration register which is saved in NVRAM and is a 16 bit value.

This post will not tackle all the 16 bits of the configuration register, but only the 13th bit which is used to either load IOS or ROMMON. Another post will be made to detail all the 16 bit configuration register.

Before continuing, it is important to understand the basic “boot process” of a router. When you power a router on, it first performs a POST, then loads the bootstrap program from ROM to RAM, which in return loads the appropriate IOS (bootstrap can load an IOS from tftp)/ROMMON or RXBOOT. Once loaded, the bootstrap program gives the hand to the IOS to handle the commands from there on.

The most important bit is the low order bit which is (2 for IOS, 1 for Rxboot and 0 for ROMMON)

Please note that by default, the low order boot bit is 2 thus 0x 2102

Example:

home-booth(config)#conf t
home-booth(config)#config-register 0×2100
home-booth(config)#do wr
Building configuration…
[OK]
home-booth(config)#do reload

Proceed with reload? [confirm]

(… a bit later)

%SYS-5-RELOAD: Reload requested by console. Reload Reason: Reload Command.
System Bootstrap, Version 12.3(8r)T8, RELEASE SOFTWARE (fc1)
Cisco 1841 (revision 5.0) with 114688K/16384K bytes of memory.

rommon 1 > System Bootstrap, Version 12.3(8r)T8, RELEASE SOFTWARE (fc1)
Cisco 1841 (revision 5.0) with 114688K/16384K bytes of memory.

rommon 1 >
rommon 1 > confreg 0×2102
rommon 2 >

What could ROMMON be useful for? Simple… restoring a router with a corrupted/broken IOS image!

Configuration Register

Prevent being tapped through your mobile phone

I have come across this article which I though qualified for a post here.

As some of you may already know it. It is possible to tap through a turned off mobile phone. More info on the subject read here and there.

Dan at this site has come up with a small technique to help prevent mobile tapping, to ensure total privacy… and all of this by just using a reed switch and a magnet.

Read more at http://www.stahlke.org/dan/phonemute/

If you are still not up for some soldering, you can always remove the battery from your mobile phone when not using it :)