December 16, 2008

How to read a load output

This is a very quick post as to quickly explained to all Linux power users on how to read a load output; since it has come to my knowledge that the great majority of linux users do not know how to after talking this afternoon with a friend.

Open a cli and type in “uptime” or “w” or why not “top”.

[[email protected] ~]# uptime 14:19:49 up 27 min,  1 users, load average: 6.63, 5.25, 3.09

*** this uptime is made up for tutorial purposes ***

Now.. to better understand, imagine this array

| 1mn | 5mn | 15mn| | 6.63 | 5.25 | 3.09 |

So each of those values are the average load during those exact time fractions… 6.63 is thus the average load during the last minute, 5.25 during the last 5mn and of course 3.09 during the last 15mn

Now, that we know what each value is associated to, we need to understand the two phases of a processus.

When I say 2 phases, I am only refering here to the cpu “running” and “awaiting” time. In other words, either a processus is being executed OR is waiting to be executed.

So those “numbers” you see, those average load numbers simply represents the number of processus being either executed or waiting to be executed on the system.

Starting to be clear? :)

Now… you may wonder, when do we say “there is too much load” and when “there isn’t”.

Let’s imagine our system only has one single cpu… according to our previous load output, it would mean

During the last minute, 6.63 processus were either running or waiting… since the single cpu would handle a processus at a time, that gives us (1 processus being executed and 5.53 awaiting), which deducs that the system was overloaded of 553%

I wrote “overloaded” in bold as this is the factor you currently should be wondering of, rather than the total average load. On a production system, it matters more as to what is on the queue, than to what is being currently processed.

Think of it as a pile of clients awaiting in the line at the grocery store… the cashier deals only with one client at a time, when the pile keeps growing and she can’t seem to keep up with the flux of client, she will start “panicking” and “stress”… get tired and probably at the end, just close the waiting line. A big number clients will then result in a second cashier to join and help.

Well, it isn’t like this really on a unix system :)! however, it just gives the idea.

Now say, we had 4 cpus… with a load of 6.63, that would mean 2.63 processes waiting, as in 263% overloaded.

I hope that was helpful and intuitive.