mcelog – monitor hardware issues
Oct 17th
It is often the case to receive a call in the middle of the night or walk into the office and find out a server which failed due to hardware problems.
What is mcelog?
From the project site description
mcelog decodes machine check events (hardware errors) on x86-64 machines running a 64-bit Linux kernel. It should be run regularly as a cron job on any x86-64 Linux system (if it is not in the default packages on your x86-64 distribution, please complain to your distributor). It can also decode machine check panic messages from console logs.
Now before we go on, it is important to understand what MCE is
MCE stands for Machine Check Exception, which is nothing but an AMD/Intel 64bit feature that allows to detect unrecoverable hardware problem such as “Communication errors between the motherboard and the CPU, CPU cache errors, Memory ECC erros etc..”
A common MCE log error would look like this
CPU 0: Machine Check Exception: 0000000400000000<0>
fault: 0000
CPU: 0
EIP: 0010:[mcheck_fault+225/336]
EFLAGS: 00010246
eax: 00000115 ebx: 72000000 ecx: 00000405 edx: 72000000
esi: 00000004 edi: 00000003 ebp: 00000115 esp: c3187f94
ds: 0018 es: 0018 ss: 0018
A program like syslogd will write the message to the console or to the kernel log; if the machine crashes, then only to the console.
mcelog will therefore “decode” those machine event errors, which are saved in the special kernel buffer /dev/mcelog.
Work with mcelog
mcelog should be run as a cron
/usr/sbin/mcelog –generic –ignorenodev –filter >> /var/log/mcelog
Make sure to check the man page of mcelog for all the options.
I would recommend setting up a script to email you in case of alerts or even why not “pipe your mcelog through a socket“
That’s it. Hopefully from now, you can catch system, hardware errors before a kernel panic
Cheers,
Cisco IOS Tips – cache running-configuration
Oct 14th
This is probably one of the most ignored and forgotten feature of IOS since 12.2(25)S and 12.2(27)SBC.
I am positing it here as I never stopped coming across routers and switches with this feature not active. Please note you need to enough memory,to use this feature; that is to say, the available space in memory to hold a copy of the interfaces configuration.
As you may guess, a router or switch with a monstrous configuration, can take a while to display the running configuration when issuing
edge1#sh run
as it needs to fetch all the configuration from various places in memory.
Quoting Cisco
When invoked, NVGEN queries each system component and each instance of interface or other configuration objects. A running configuration file is constructed as NVGEN traverses the system performing these queries.
To speed things up, IOS ships in with a feature called Configuration Generation Performance Enhancement , which caches the interfaces configurations, which in return speed up NVGEN.
Activate caching with
edge1(config)#parser config cache interface
and voilà.
OSPF BDR DR election process
Oct 1st
This post assumes that you have a basic understanding of OSPF… if not, I suggest jumping over http://en.wikipedia.org/wiki/OSPF for a first quick read. However for the sake of this post, I will go over some basic reminders.
The “hello” packet
The OSPF routers sends a periodic packet referred to as the hello packet ‘multicast 224.0.0.5′ which is composed of the OSPF header + different fields ID necessary for routers to neighbor and become adjacent. The hello packet is by default sent at a 10 seconds interval on a multi-access network and each 30 second on a point to point network.
The HELLO PACKET (roughly 50 bytes) looks as following
[ OSPF HEADER ] | Network Mask | Hello Interval | Options | Router Priority | Router Dead Interval | DR | BDR | Neighbor
The OSPF HEADER (20 bytes) looks as following
Version number | Type | Packet Length | Router ID | Area ID | Checksum
The neighboring and adjacent process
Like I explained earlier, OSPF uses the hello packet not only to discover another peer router, but also to neighbor with this router. For 2 OSPF routers to neighbor, they must belong to the same AREA (Area ID), use the same Authentication schema, have the same hello and dead intervals. Past the agreement phase, the routers becomes “neighbors”.
Only when they are neighbors, OSPF routers will start exchanging their database… this process is referred as Adjacency.
Now let’s imagine a singular segment on which we have 10 OSPF routers… in theory, each router would peer with each other and start exchanging their database with each others. The number of adjacency is then calculated as followed
(n (n – 1) ) / 2
So 10 routers, will give us 45 Adjacency
To minimize the amount of information shared, OSPF will elect a Designated Router (DR) and a Backup Designated Router (BDR). Once the DR and BDR are elected, every other OSPF router will start exchanging database only with the DR and BDR and no longer with each other.
Now keep in mind, as we said earlier OSPF routers use multicast IP 224.0.0.5 to send their hello packets but also exchange their databases… in presence of a DR/BDR, the other routers will send their updates on multicast 224.0.0.6, which in return the DR/BDR will resend on multicast 224.0.0.5
So how does the DR and BDR election takes place?
It is quite simple, if you are used to the switch root bridge election, this will not look much different. The BDR and DR takes place through the HELLO PACKET by comparing the Priority ID (which if you recall is located in the hello packet as shown earlier).
The router with the highest Priority ID is elected the Designated Router (DR), the next router with second highest Priority ID will become the BDR. Now keep in mind, by default all router interfaces have a priority ID of 1… if on a particular segment, all the Priority ID of all routers match, the Router ID (OSPF header) will then be the next ID to compare in order to elect the DR/BDR. Again in the same mind set, the OSPF router with the highest Router ID will be elected the DR or BDR.
Keep in mind that once the DR/BDR are elected, if a new OSPF router is added with the highest priority of all, the DR/BDR will not change… to start the election process, you will have to clear up the OSPF process
Once the DR and BDR are elected, the BDR will only listen to the exchange between the peers and the DR and elects itself as the DR if the current DR was to fail.
As a last thing to remember, without DR/BDR, we calculated 45 Adjacency for 10 routers on a multi-access segment. Now how many adjacency do we have with a DR and BDR? Simple!
2*n – 1 –> 2×10 – 1 = 19 Adjacency … so from 45 Adjacency, we dropped down to 19 Adjacency with a DR and a BDR.
If you were to only elect a DR without BDR, then you would naturally obtain 9 Adjacency.
To keep in mind
- If you do not want a router to participate in the DR/BDR election, sets its Priority ID to 0, it will then be shown as DROTHER.
- You can override the RID of the OSPF router by creating a loopback interface with a different IP than the one used on the router’s interface
- The BDR and DR election only take place on broadcast and non-broadcast multi-access… That is to say routers on serial WAN would not have a BDR/DR election
Ethernet flow control and IGMP snooping
Sep 23rd
It is important to note that TCP flow control mechanism as well as Ethernet flow control mechanism are completely 2 different mechanism, which strive to achieve the same unique goal but when in used, are completely unaware of each other.
As a matter of fact, Ethernet flow control can fully alienate your network if not planned and used carefully
…
So What is TCP flow control?
Flow control is a mechanism implemented in the TCP stack which enables a receiver endpoint to notify a sender that it can no longer receive data in its buffer. The buffer size is what is simply referred as the TCP Window Size, and is transmitted in each ACK. The receiver can therefore let the sender know, how much bytes it is able to process at once.
[ let's assume, the receiver machine can only process 8K in its buffer]
(sender) <——– ACK 1022 WIN 4096 <——– (receiver)
(sender) ———> 4K | SEQ 1022 ————–> (receiver)
[ assuming that the buffer of the receiver is now full with the first 4K ]
(sender) <——– ACK 2024 WIN 0 <——– (receiver)
[ the sender is now "blocked" from sending more data till the receiver sends a second acknowledgment]
(sender) <——– ACK 2024 WIN 4096 <——– (receiver)
Ok so now, what is Ethernet flow control?
From layer 4 (TCP flow control), we jump now to layer 2 (Ethernet flow control).
Ethernet flow control is different from TCP flow control as it makes usage of the MAC control frame “pause frame” to notify the end device to stop sending frames. It is important to keep in mind that, the sender of the pause frame sets the 2bit quanta time which defines how long the endpoint must wait to start retransmitting frames and finally to keep in mind that pause frames are not forwarded. That is to say, a MAC control frame will not be forwarded through a trunk port, nor to the adjacent device.
What is the problem when using Ethernet flow control?
If you have read so far, you can start guessing what may occur, if you have “ethernet flow control” enabled on your switch. Instead of dropping the packets when the tcp window size is exhausted, the switch will not drop the packet but generate its own pause frame and send it to the sender host. Now keep in mind that pause frames completely cease all transmission on the data link layer… that is to say if meanwhile PCX was getting a file of PCB, it would as well be “paused”. Because pause frame only work on layer 2 “data-link”, all communications associated to the targeted switch port, will completely cease for the pause period of time.
But what happens meanwhile with the TCP flow control?
Like said earlier, the TCP flow control isn’t aware of the data flow control… the TCP flow control allows TCP to throttle the amount of data it is sending, because the switch no longer drops packets due to “ethernet flow control”, TCP becomes unaware that it is sending more data than what the endpoint window size can receive and thus keeps increasing the amount of data it is sending… the result is an overloaded receiver and a switch which keeps generating pause frames, till the TCP flow control detects congestion and readjusts the sending window.
And what happens when you have IGMP snooping off?
Imagine a multicast scenario, where you have a server and a workstation on 2x 1Gb port and another workstation on a 100Mb. If the server starts sending multicast packets at 1Gpbs (in the absolute
), Ethernet flow control will directly start to throttle down the speed at which the server sends the packet to the lowest port speed of the switch. Remember we are talking multicast here and because packets would be delivered to the 100Mb port… Ethernet flow control on the switch would force the server to only send at 100Mbps. While this is good in practice, remember without IGMP snooping,the switch would be sending all the multicast packets to all the switch ports, thus to endpoints which are unsolicited in the mutlicast group, will cause Ethernet flow control to trigger bad and slow performance.
Conclusion
IGMP snooping has always been a problem in VRRP setup (aka. Checkpoint HA), causing fluctuation on the interface state (referred as flapping interfaces).
While it is possible to disable IGMP per VLAN, I would recommend disabling IGMP snooping per MAC Multicast Address (i.e 01:50:5e:xx:yy:zz)
Filter networks with BGP
Sep 20th
There are 3 easy ways to filter/restrict certain networks to be announced through BGP to a remote/adjacent AS (Autonomous System).
Those 3 simple ways include: prefix-list | Extended Access-list + Route-map | Extended Access-list + Distribute-list
To Note: before we go on, I need to specify that creating an extended access list to be in use with BGP (route-map, distribute-list) is almost as similar as creating a prefix-list… Having said that, we are therefore no longer matching source and destination address but merely address prefix and netmask with the access list.
Let’s assume in all 3 examples, we do not want add the network 192.168.4.0/24 to our routing table when advertised from our one eBGP peer – AS 64515.
* in this example, we are of course using a private ASN
1. Prefix-list
First we jump into global configuration mode and create a prefix-list filter named “DENY-PREFIX”
border1#conf t
border1(config)#ip prefix-list DENY-PREFIX seq 10 deny 192.168.4.0/25
border1(config)#ip prefix-list DENY-PREFIX seq 20 permit 0.0.0.0/0 le 32
border1(config)#router bgp 64514
border1(config-router)#neighbor 192.168.10.1 remote-as 64515
border1(config-router)#neighbor 192.168.10.1 prefix-list DENY-PREFIX in
border1(config-router)#do wr
2. Extended access-list / Route-map
First, we create an extended access list in global config mode
border1#conf t
border1(config)#access-list 101 deny ip host 192.168.4.0 host 255.255.255.0
border1(config)#access-list 101 permit ip any any
We then now proceed to create a route map (still in global config mode)
border1(config)#route-map NET-FILTER permit 20
border1(config-route-map)#match ip address 101
We jump back in global config mode
border1(config)#route-map NET-FILTER deny 30
border1(config-route-map)#exit
border1(config)#router bgp 64514
border1(config-router)#neighbor 192.168.10.1 remote-as 64515
border1(config-router)#neighbor 192.168.10.1 route-map NET-FILTER in
border1(config-router)#do wr
3. Distribute-list
Similar to route-map, we will be using an extended access list to accomplish the filtering.
We will be using the same access list we defined early for rout- maps, which is access-list 101
border1(config)#router bgp 64514
border1(config-router)#neighbor 192.168.10.1 remote-as 64515
border1(config-router)#neighbor 192.168.10.1 distribute-list 101 in
border1(config-router)#do wr
- Final point but not last
Remember that for inbound updates, the order of preference is
-
first route-map
-
filter-list
-
prefix-list/distribute-list
and for outbound updates
-
prefix-list/distribute-list
-
filter-list
-
route-map