Archive

Archive for the ‘Networking’ Category

Optimized Edge Routing – OER

May 31st, 2010 Ali Abbas No comments

While they are many ways to optimize routing by aggregating many technologies together as using BGP with MPLS, it is often easier to simply fall back to OER to influence traffic routing based on Netflow events/IP-LSA, packet loss, response time, load balancing policies, line jitters and thus dynamically adapt route metrics, inject or remove routes using IGP/BGP.

OER is therefore able to automatically detect degradation/congestion on specific links and influence routing over alternate paths, based upon its configuration.

Having said that, in every deployed topology, there must be one “Master Controller OER”, either itself a border router or connecting to the border routers, each interface (inbound/outbound) of the border routers should be registered and under OER’s control. the MC router would then upload routing policies to the Border Routers, themselves connecting to the Wan links.

Now before we go on, it is important to keep in mind that OER only monitors outbound interfaces and will only control outward traffic. It will not affect Inter-domain routing nor asymmetrical routing.

There are 5 phases to remember when dealing with OER

1. Profile Phase

OER is running in “learning mode” during which it monitors the traffic flowing through the outbound interfaces ‘under its control’, evaluating traffic classes with a performance problem such high delays etc…

2. Measure Phase

Generated Performance Metrics of the Profiled traffic classes are then reported from the Border Routers to the Master Controller. Those Metrics can either be generated while OER evalues the traffic flowing through the data path or actively simulate the traffic to evaluate performance. Those 2 methodology are respectively refered as Passive and Active Monitoring.

3. Apply Policy Phase

This phase is actually quite miss-leading, as OER is simply evaluating traffic performance against a set of threasolds manually configured for each traffic class. For example, how many % of packet loss is acceptable for a specific traffic class or on a specific link etc.. The MC therefore will evaluate which flow is OOP (Out Of Policy). They are therefore 2 type of defined Policy, Traffic class Policies (Application, Prefixes) and Link Policies (Inbound/Outbound link).

4. Control Phase

Once the flow has been identified to be OOP, the MC will dynamically adjust the data path by injecting routes or modifying the routes using IGP or BGP and redirect the traffic from one exit to the other.

5. Verify Phase

Once the flow has been redirected, it is again evaluated as in “Policy Phase” and if determined to be in OOP, the changes are reverted back and the “Measure Phase” is triggered for the flow to be re-evaluated.

Categories: BGP, Networking, OER

Linux Kernel Route Cache

May 15th, 2010 Ali Abbas No comments

To understand the importance of the routing cache, it is important to keep in mind and visualize the 3 main routing hash tables in use in the kernel for routing decisions… the Route Cache (what we will be discussing), the Route Policy Database and the Route Table. It is also in this order that the network subsystem queries the tables to make a forwarding decision. To display the “Route Cache”, one could simply issue the “ip route show cache” command.

[ kernel network subsystem ] —-> Route Cache || [ If no match ] —-> RPDB || [ If no match ] —-> Route Table

When the routing subsystem of the kernel is initialized, an exec of ip_rt_init is initiated.

void __init ip_init(void)
{
dev_add_pack(&ip_packet_type);
ip_rt_init();
inet_initpeers();
#ifdef CONFIG_IP_MULTICAST
proc_net_create("igmp", 0, ip_mc_procinfo);
#endif
}

(source: linux/net/ipv4/ip_output.c)

Part of ip_rt_init is to allocate the memory set to be used to cache the network routes but also to initialize global variables such as rt_hash_table and rt_hash_rnd and many others.

You can view the details of the ip_rt_init function in /net/ipv4/route.c

rt_hash_table defines a hash table of the route cache with rt_hash_mask holding its size. An easy way to check the size of the routing cache table is to look in dmesg, by grepping “IP route”

aabbas@mig:~$ dmesg |grep “IP route”
[    1.814492] IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)

The max size of the route cache is configurable through /proc/sys/net/ipv4/route/max_size. That being said, the current size is determined when booting in junction with the amount of ram available; furthermore to prevent the size of the hash to exceed the max_size, the kernel makes use of a garbage collector gc_tresh

When an interface goes down/up or a new changes take place which would affect the routing cache, the kernel executes rt_cache_flush which in returns executes rt_run_flush.Many events such as IP removal, removal of interfaces etc… trigger the route cache to be flushed, however keep in mind, that periodically it will be flushed based on the value of rt_secret_timer. The time value is configurable in /proc/sys/net/ipv4/route/secret_interval.

To trigger a route cache flush, issue

echo -1 > /proc/sys/net/ipv4/route/flush or ip route flush cache

ip_rt_min_delay and ip_rt_max_delay define the time within which the flush would occur, setting the ip_rt_min_delay value to 0 would immediately ensure that the cache is flushed when rt_run_flush is triggered.

I have only tackled a tiny aspect of the Route Cache network subsystems and hopefully this gives some basic ideas unto which to build to get a grounded understanding on how the global subsystem operates at the Network Stack – having said that, the routing cache is only one subset of all its functions.

Cheers,

Ali

Categories: Networking, Unix / Linux

OSPF LSA Types

May 6th, 2010 Ali Abbas No comments

Following my blog post on OSPF BDR – DR election, I have received various questions regarding OSPF and thus I decided to write a small excerpt to describe the type of LSAs generated whether it is in a stub area etc…

Before I start, it is important to keep in mind, that LSAs are carried into OSPF packet 4 . Those OSPF packet 4 are referred as LSU (Link-State Update) and carries the router’s LSA (one or more).

Now, they are 11 type of LSAs, however LSA 6 – 8/9/10/11 will not be covered in this post but in another post

LSA-1 – Router Link LSA

By default all routers in an OSPF area will be sending LSA updates of type 1. Those LSA never gets forwarded out of the area and are sent by each member to inform the other members of its links states, adjacencies (stub interfaces etc..) and cost to the area.

LSA-2 – Network Link LSA

Back to my previous post, this LSA is only generated by the DR (Designated Router); it is generated to inform the routers within its area, which routers are all part of the same segment.

LSA-3 – Network Summary Link

This LSA is only generated by the ABR and exchanged between the areas. It allows routing and communication between the areas as the ABR agregates the routes it learns.

LSA-4 – AS External ASBR Summary Link

Sent from an ABR to the routers in its area, it defines the next hop to reach the external routes advertized by the ASBR

LSA-5 – External Link LSA

This LSA is generated by an ASBR. It contains the routes redistributed into the area. When looking at the routing table, you will see that those routes will have an appendix of E1 or E2.

LSA 7 – NSSA External LSA

This LSA is also generated by the ASBR inside a NSSA, it describes the routes that have been distributed into the NSSA. Furthermore, keep in mind that as the LSA leaves the NSSA to the backbone, it is translated as an LSA type 5 (LSA-5). When looking at the routing table, you will see that those routes have an appendix of N1 or N2.*

As a short summary

  • Standard areas – LSA type { 1 – 2 – 3 – 4 – 5 }
  • Stub areas – LSA type { 1 – 2 – 3 }
  • Totally stubby areas – LSA type { 1 – 2 – 3 }
  • Not-so-stubby areas – LSA type { 5 – 7 }
Categories: Networking, OSPF

Ethernet Flow Control DOS in STP environment

April 25th, 2010 Ali Abbas No comments

A while back, I wrote a post on Ethernet Flow Control and IGMP snooping and how using TCP flow control on top of Ethernet Flow Control could easily alienate your network. If you are not familiar with Ethernet Flow Control, then I highly suggest, you go over my post in order to understand what I will be talking about here.

An understanding of the Spanning Tree Protocol is also highly required, which I won’t go in depth in this post.

Small Recap

Ethernet Flow Control makes use of PAUSE frames in order to notify an endpoint host to stop sending packet for an X amount of time; Depending on the bandwidth of the link, the PAUSE frames are sent at a specific interval of time.

100Mbps – each 300ms
1Gbps – each 30ms
10Gpbs – each 3ms

That is to say, sending more PAUSE frame than the appropriate set interval will considerably slow down the network and generate unexpected side effects.

Using PAUSE frame to have a new STP topology converged – DOS

- Target Designated ports

Because PAUSE Frame simply notify the host to stop sending frames; if we were to flood all the Designated ports of a root switch, we would immediately cut all transmission of root BPDU and user frame. This by consequence, will cause all root ports on the other switch to enter into a new STP convergence, since the root bridge has been completely cut off.

- Target Blocking ports

Remember, in the Blocking state, the port still receives BPDU without forwarding frames. Now let’s say we decide to flood all the segments connecting to the blocking ports. What do you think will happen? well, the blocked port no longer receives BPDU and after 20seconds starting transitioning to the listening state, a total of 50 seconds later, a new convergence has taken place, resulting in a redundant link becoming active and thus creating a switching loop/brodcast storm, which will degrade the network and bring it to its knees.

How to prevent an STP DOS

1. Disable Ethernet Flow Control on a port if not needed

2. Set a threesold for Pause Frames sent and received

3. Monitor for STP topology state changes

4. Monitor the traffic for high frequencies of TCN and TC BPDUs

Cheers,

Ali

Categories: Networking

BGP Next-Hop-Self Attribute

April 15th, 2010 Ali Abbas No comments

The BGP Next Hop Attribute is useful when passing routes received from an eBGP speaker and advertised to an iBGP speaker within the same Autonomous System.

By default when a route is advertised to an eBGP outside of the AS, the router will make sure that the next hop attribute reflects its IP address… now imagine a route is advertised to an iBGP speaker and sourced into the BGP AS group. What is going to happen is that all iBGP routers will have as next hop the external eBGP router of the external Autonomous System.

To prevent this, we can make sure that a route advertised to an iBGP router reflects the IP address of the router sourcing that route into the AS to the iBGP neighbors and not the IP address of the eBGP neightbor which originally advertised this route.

It is important to keep in mind that BGP always make sure that a “hop/destination” is reachable before advertising – if the hop is not reachable, the route will still be held in the BGP table… in the previous case we discussed, the eBGP neightbor from the external  AS would have been “reachable” to our eBGP router (edge-t1), but not necessary to our iBGP speakers within our AS (depending on your configuration).

To avoid potential routing black-holes, one must then make use of the “next-hope-self” attribute to force the iBGP speaker to set the next hop of the route advertised to its own IP address.

- An example configuration on IOS would look as follow -

edge-t1# conf t
edge-t1(config)# router bgp 65100
edge-t1(config-router)# neightbor 10.210.0.10 remote-as 65100
edge-t1(config-router)# neightbor 10.210.0.10 next-hop-self
edge-t1(config-router)# exit
edge-t1(config)#do wr

* Furthermore, you may also use the “set ip next-hop peer-address” under a route-map and apply that route-map on outgoing routes to the BGP neighbor.

Cheers,

Ali

Categories: BGP, Networking