December 17, 2008

Strace - Reverse Engineering - System Calls

If there is one recurring problem that I often see gagging the forums is “Library missing”, or often “installed libraries which a program doesn’t find”.

I decided to share a simple debugging technique which could save the day or even the hours… Google might not be the right choice all the time, when you have got strace at your finger tips.

1. System Calls

To understand strace, you first need to understand what a system call is. So what is a system call? a system call is simply a kernel function, which I would say executes within the kernel mode and thus resides between the user code and the kernel.

Whenever in a C program, you call the function open(), you are indeed calling a C function “open” which in turn just switches from user mode to kernel mode and run the system call “open” of the kernel.

So the concept of switching is very important here to our understanding of system calls and functions. A switching event would usually be either a software interrupt, a gate or trap instruction.

2. Reverse Engineering with Strace

First let’s break our system to setup

[sourcecode=“bash”]

[[email protected] ~]# ldd /bin/ls linux-gate.so.1 =>  (0x009c2000) librt.so.1 => /lib/librt.so.1 (0x003d0000) libacl.so.1 => /lib/libacl.so.1 (0x003b3000) libselinux.so.1 => /lib/libselinux.so.1 (0x00b20000) libc.so.6 => /lib/libc.so.6 (0x00243000) libpthread.so.0 => /lib/libpthread.so.0 (0x0038e000) /lib/ld-linux.so.2 (0x00225000) libattr.so.1 => /lib/libattr.so.1 (0x003ac000) libdl.so.2 => /lib/libdl.so.2 (0x00388000) libsepol.so.1 => /lib/libsepol.so.1 (0x00110000)

[/sourcecode]

As we can see those are the libraries “ls” must load before executing its system call and give us the usual pretty output.

Let’s move librt.so.1 out of /lib to our backup folder in /root/libBackup

Execute ls at the command line

[sourcecode=“bash”]

[[email protected] ~]# ls ls: error while loading shared libraries: librt.so.1: cannot open shared object file: No such file or directory

[/sourcecode]

Of course, the error message here is pretty obvious… ls needs “librt.so.1” to run and as good systems administrators, we all know where to look in for shared libraries right ?

Anyway, for the sake of this exercice, let’s assume we have no clue that librt.so.1 is supposed to be in /lib…

(now for the fun of it, google the “above ls error” and be amazed on how many person reported this error on forums)

So let’s use our strace magic here and see how we can fix the problem.

[sourcecode=“bash”]

[[email protected] ~]# strace /bin/ls execve(“/bin/ls”, [“/bin/ls”], [/* 21 vars */]) = 0 brk(0)                                  = 0x88d2000 access(“/etc/ld.so.preload”, R_OK)      = -1 ENOENT (No such file or directory) open(“/etc/ld.so.cache”, O_RDONLY)      = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=22984, …}) = 0 mmap2(NULL, 22984, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f88000 close(3)                                = 0 open(“/lib/librt.so.1”, O_RDONLY)       = -1 ENOENT (No such file or directory) open(“/lib/tls/i686/sse2/librt.so.1”, O_RDONLY) = -1 ENOENT (No such file or directory)

[/sourcecode]

As we can see, “ls” is trying to open librt.so.1 at “open(”/lib/librt.so.1”, O_RDONLY)       = -1 ENOENT (No such file or directory)“… reading the following of the output, we see how the program is trying to look up for the library file in other libraries folders as set in our Library Path shell variable.

The solution, would therefore be to “MOVE” our librt.so.1 file back to our /lib folder and resolve our headaches.

(I wrote MOVE in bold, since COPY relay on this library… so copy would be broken as well at this point).


Now, let us spice up things around, let’s erase the content of librt.so.1  (Make sure to backup the original).

[sourcecode=“bash”]

[[email protected] ~]# echo “” > /lib/librt.so.1

[/sourcecode]

let’s try… and…

[sourcecode=“bash”]

ls: error while loading shared libraries: /lib/librt.so.1: file too short

[/sourcecode]

Now, things are getting interesting… you may wonder, what in the world, does “file too short” could possibly mean?

The error, gives you the path “/lib”, so we know the file is there, since it doesn’t necessary complain that it can’t find it. So let’s try to strace is and get what is really happening.

[sourcecode=“bash”]

[[email protected] ~]# strace /bin/ls execve(“/bin/ls”, [“/bin/ls”], [/* 21 vars */]) = 0 brk(0)                                  = 0x8147000 access(“/etc/ld.so.preload”, R_OK)      = -1 ENOENT (No such file or directory) open(“/etc/ld.so.cache”, O_RDONLY)      = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=22984, …}) = 0 mmap2(NULL, 22984, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f13000 close(3)                                = 0 open(“/lib/librt.so.1”, O_RDONLY)       = 3 read(3, “\n”, 512)                      = 1 close(3)                                = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f12000 writev(2, [{“/bin/ls”, 7}, {”: “, 2}, {“error while loading shared libra”…, 36}, {”: “, 2}, {”/lib/librt.so.1”,

15}, {”: “, 2}, {“file too short”, 14}, {“”, 0}, {“”, 0}, {”\n”, 1}], 10/bin/ls: error while loading shared

libraries: /lib/librt.so.1: file too short ) = 79

[/sourcecode]

Now.. thanks again to Strace, we got our answer..

For the sake of understanding all this gibberish, let’s go through the most essential ones.

  1. execve executes a program pointed by the const filename and optionally an argv const parameter
  2. brk(0) - brk called with the argument 0 just looks up for a breakpoint, set of free and malloc (memory management) takes place at this level
  3. nmap is creating here a pagefile at 0xb7f13000

then comes the open call we saw earlier, followed which in return is followed by a “read(3, “\n”, 512) = 1”

Now… let’s break here and go back to our error “file too short”….

read() - ssize_t read (int fd, void *buf, size_t count) - access the file and loops its content through a buffer buf to  the number of bytes “count”.. upon read, read will therefore outputs the number of bytes read. In our result here, we see 2 things: the buffer starts with “\n” and the return number of bytes read is 1… whereas it is supposed to be 512, since the lib file is supposed to contain 512 bytes count of data.

A shared library is also supposed to contain an ELF header… which in this case, it doesn’t (of course, it doesn’t we did erase that lib content lines earlier :) )

A library header would therefore be as **read(3, “\177ELF……“, 512) **

The “\n” starting buffer and 1 byte read therefore means that our file is just empty ? :)

  • Problem solved -

3. Other cases where Strace can help

Feeling like some programs run slow? Do an strace and look up the access paths for each library… this would tell you about your LD_LIBRARY_PATH and potentially for what to optimize

Another common case would be system call hangs, when the system call has no code return, which in return would lead to debug using other tools.

I hope that was useful :)