How Main() get called – Shell Prompt to Main()

Brief description on what happens in a Linux system, from the time that you type program name and press “enter” at the shell, to the time that control reach your main() function. May help to understanding some internal stuff and how the main() is getting called.

Every executable file will have a header (ELF, a.out, EXE,..) to help the loader to understand the executable file format and where to start the execution etc. In GNU/Linux, the default executable file format is ELF (Executable and Linking Format). I am going to use the following test code and it’s binary plus few tools (strace, ltrace, ldd, readelf, objdumb, nm) for this study:

$ echo “int main() { return 0; }” > main-tst.c
$ gcc -o main-tst main-tst.c

Using the tracer, we can get the basic flow of system and library calls:

$ strace ./main-tst
$ ltrace ./main-tst

From the output, we can see that only execve() is dealing with out program. So we will follow that:

  1. execve() [libc function] – Libc function transfer the control to kernel function sys_execve.
  2. sys_execve() [445: kernel/x86/kernel/process_32.c] – Call the do_execve with file name, argv pointer [ECX], env pointer [EDX]
  3. do_execve() [1290: kernel/fs/exec.c] – Create the binary program header and call search_binary_handler to identify and execute the binary.
  4. search_binary_handler() [1211: kernel/fs/exec.c] – Find and call the suitable executable format handler.
  5. load_elf_binary() [kernel/fs/binfmt_elf.c] – Parse and load the ELF header. Find the stack base, ELF_Entry address and call the start the process.
    1. bprm->p = STACK_ROUND(sp, items);
    2. elf_entry = loc->elf_ex.e_entry;
  6. start_thread(regs, elf_entry, bprm->p) [kernel/x86/kernel/process_32.c]

Next, using readelf tool, we can get the ELF Header and Entry point address.

$ readelf -h main-tst
..
Entry point address: 0x8048300
..

Next, from the symbol table, we can get the entry point section:

$ readelf -s main-tst
..
47: 08048300 0 FUNC GLOBAL DEFAULT 14 _start
..

Next, to get more information, we have to disassemble the code:

$ objdumb -d main-tst
Goto the _start section:
08048300 <_start>:
..
8048308: push %eax ; Zero
8048309: push %esp ; Highest stack Address (END)
804830a: push %edx ; Libc atexit()
804830b: push $0x80483c0 ; Libc Finalization
8048310: push $0x80483d0 ; Libc Initialization
8048315: push %ecx ; Argument Variable (argv)
8048316: push %esi ; Argument Count (argc)
8048317: push $0x80483b4 ; our main()
804831c: call 80482e8 <__libc_start_main@plt>
..

Now it is clear that, from start_thread(), the control reach “_start” section and after executing this section, the stack status will be:

  • main (top)
  • argc
  • argv
  • _init
  • _fini
  • atexit
  • end of the stack

Next our focus is to understand the flow of “call 080482e8”:

080482e8 <__libc_start_main@plt>:
80482e8: jmp *0x804a004
80482ee: push $0x8
80482f3: jmp 80482c8 <_init+0x30>

The 1st instrustion is JUMP to the address pointed by 0804a004. But there no reference to the 804a004 in the output. Oops, why?

Some part of the codes are dynamical linked and address of the dynamic code will be decided only at the run-time. The __libc_start_main() is a libc function and libc is dynamically linked with our program.

$ ldd main-tst
linux-gate.so.1 => (0x005d9000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x4c0d7000)
/lib/ld-linux.so.2 (0x4c0b8000)

The address 080482e8 is just a place holder and actual __libc_start_main() is some where else. So this function references are stored in the Procedure Linkage Table(PLT) and PLT redirects the position-independent function call to an absolute location at the run-time. The ELF file will hold absolute addresses for all of the static data referenced, in Global Offer Table (GOT).

Print the GOT:

$ objdump -R main-tst
DYNAMIC RELOCATION RECORDS
OFFSET – TYPE – VALUE
08049ff0 R_386_GLOB_DAT __gmon_start__
0804a000 R_386_JUMP_SLOT __gmon_start__
0804a004 R_386_JUMP_SLOT __libc_start_main

__libc_start_main() function signature:

int __libc_start_main( int (*main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (* stack_end))

__libc_start_main() responsible for:

  • performing any necessary security checks if the effective user ID is not the same as the real user ID.
  • initialize the threading subsystem.
  • registering the rtld_fini to release resources when this dynamic shared object exits (or is unloaded).
  • registering the fini handler to run at program exit.
  • calling the initializer function (*init)().
  • calling main() with appropriate arguments.
  • calling exit() with the return value from main().

Using the ltrace tool, we will try to get the values passed to __libc_start_main

$ ltrace ./main-tst
__libc_start_main(0x80483b4, 1, 0xbfcaf494, 0x80483d0, 0x80483c0 <unfinished …>
+++ exited (status 148) +++

Now by checking the above content, we can find-out, which address is pointing to which funcition. The __libc_start_main() will pop items from stack and call our main() along with proper arguments.

References:

Advertisements
Tagged with: ,
Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: