I am looking at statically linked linux x86 stripped binary. I noticed that there are .got and .plt sections.
I wonder what does a statically linked binary need got and plt sections for ? Anyone ?
There are a plethora of things programmers do not know about how ELF binaries work internally. And, unfortunately, there's almost no solid references apart from two or three which broadly cover the subject. Many tools (linkers, loaders, assemblers, debuggers, ...) remain a mystery for most of you. When it comes to linkers and loaders, the main reference is Linkers and Loaders by John R. Levine (http://linker.iecc.com/). Another reliable source of information is the official ELF binary format documentation. But these are merely introductions to how a certain, or most, technologies work.
Now, here's an answer to your question (why are the GOT and PLT sections still included in static ELF binaries?): PERFORMANCE.
More explanations ... Suppose you have this C code:
#include <stdio.h> #include <string.h> int main(int argc, char **argv) { char str[1024]; strcpy(str, argv[1]); printf("%s\n", str); return 0; } No need to be a genius to figure out that all it does is copy a command line parameter into a string and print it out. Here's the main function in assembly:
000000000040105e <main>: 40105e: 55 push rbp 40105f: 48 89 e5 mov rbp,rsp 401062: 48 81 ec 10 04 00 00 sub rsp,0x410 401069: 89 bd fc fb ff ff mov DWORD PTR [rbp-0x404],edi 40106f: 48 89 b5 f0 fb ff ff mov QWORD PTR [rbp-0x410],rsi 401076: 48 8b 85 f0 fb ff ff mov rax,QWORD PTR [rbp-0x410] 40107d 48 83 c0 08 add rax,0x8 401081: 48 8b 10 mov rdx,QWORD PTR [rax] 401084: 48 8d 85 00 fc ff ff lea rax,[rbp-0x400] 40108b: 48 89 d6 mov rsi,rdx 40108e: 48 89 c7 mov rdi,rax 401091: e8 3a f2 ff ff call 4002d0 <__rela_iplt_end+0x38> 401096: 48 8d 85 00 fc ff ff lea rax,[rbp-0x400] 40109d: 48 89 c7 mov rdi,rax 4010a0: e8 fb 09 00 00 call 401aa0 <_IO_puts> 4010a5: b8 00 00 00 00 mov eax,0x0 4010aa: c9 leave 4010ab: c3 ret 4010ac: 0f 1f 40 00 nop DWORD PTR [rax+0x0] Notice that at the address 401091 you have a call to a function stored in the PLT (the label is more expressive). Amazingly, at this address 4002d0 you'll find a jump to something stored in the GOT (see below).
4002d0: ff 25 f2 2f 2c 00 jmp QWORD PTR [rip+0x2c2ff2] # 6c32c8 <_GLOBAL_OFFSET_TABLE_+0x20> At that exact location in the GOT, you'll find calls to functions stored in sections such as the following:
00000000004187d0 <handle_amd>: 4187d0: 53 push rbx 4187d1: b8 00 00 00 80 mov eax,0x80000000 4187d6: 0f a2 cpuid 4187d8: 81 ff c4 00 00 00 cmp edi,0xc4 4187de: 7f 40 jg 418820 <handle_amd+0x50> 4187e0: 31 d2 xor edx,edx 4187e2: 81 ff bf 00 00 00 cmp edi,0xbf 4187e8: 0f 9d c2 setge dl 4187eb: 81 ea fb ff ff 7f sub edx,0x7ffffffb 4187f1: 39 c2 cmp edx,eax 4187f3: 77 2b ja 418820 <handle_amd+0x50> 4187f5: 89 d0 mov eax,edx 4187f7: 0f a2 cpuid 4187f9: 81 ff bb 00 00 00 cmp edi,0xbb 4187ff: 7e 27 jle 418828 <handle_amd+0x58> 418801: 81 ef bc 00 00 00 sub edi,0xbc 418807: 83 ff 08 cmp edi,0x8 41880a: 0f 87 48 01 00 00 ja 418958 <handle_amd+0x188> 418810: 48 8d 35 c9 0b 08 00 lea rsi,[rip+0x80bc9] # 4993e0 <__PRETTY_FUNCTION__.4767+0x20> 418817: 48 63 04 be movsxd rax,DWORD PTR [rsi+rdi*4] 41881b: 48 01 c6 add rsi,rax 41881e: ff e6 jmp rsi 418820: 31 c0 xor eax,eax 418822: 5b pop rbx 418823: c3 ret First, look at the section's name. Second, if you look closely at the code you'll notice that this function identifies the CPU - by dissecting the return values of the cpuid instruction (4187d6 and 4187f7) - (more accurately the micro architecture and other features such as cache size, ...) you're running your ELF binary on, and then decides which implementation suites that configuration best. This way, the strcpy function called in the above C code will always be the fastest possible, whatever architecture you're on (Intel: Nehalem, Sandy Bridge, Ivy Bridge, Haswell, ...; AMD: Phenom, Opteron, ...; ...). Keep in mind that those fast implementations have been hand optimized and fine tuned for each of the possible target architectures.
So that's what the PLT and GOT sections are used for in your static ELF binary file.
Now, if you want to investigate this yourself, you should compile the C code above with GCC version 4.9 (which is the one I used) using the -static and -g3 (debug symbols) flags. Then, disassemble the binary file using objdump and the -D switch in order to have all the ELF sections. You can then go through all the sections and explore the assembly code. You can also run the binary file using gdb and set breakpoints at key locations and run the program step by step.
-fno-plt if you want to remove all indirection. i.e gcc -fno-plt -static -march=<target> -mtune=<target> ... @yaspr's answer is great, since this question got some bounty of "Looking for an answer drawing from credible and/or official sources.", let me try to provide some references here.
Generally in my understanding, .PLT and .GOT tables are required here because of performance issues.
BinCFI is published on last year's top 2 computer security conference.
Since the purpose of PLT stubs is to dispatch cross module calls, it would seem that the targets can only be exported symbols from other modules. However, recent versions of gcc support a new function type called gnu indirect function, which allows a function to have many different implementations, with the most suitable one selected at runtime based on factors such as the CPU type. Currently, many glibc low level functions such as memcpy, strcmp and strlen use this feature. To support this feature, a library exports a chooser function that selects at runtime which of the many implementations is going to be used. These implementation functions may not be exported at all.
Some other references on how to leverage this feature are listed here.
GNU_IFUNC and you can find some more info over on SO.
.gotand.got.pltare still present even when-staticis given to the compiler. But, the.dynamicsection is not present. My guess is that they are just empty sections when-staticis given.got/plttables is to use the-nostdliboption.-nodefaultlibsa try first. but you might have to provide your own implementations of any standard functions the compiler might require (memcpy, etc...)