148

I was told to use a disassembler. Does gcc have anything built in? What is the easiest way to do this?

2

12 Answers 12

222

I don't think gcc has a flag for it, since it's primarily a compiler, but another of the GNU development tools does. objdump takes a -d/--disassemble flag:

$ objdump -d /path/to/binary 

The disassembly looks like this:

080483b4 <main>: 80483b4: 8d 4c 24 04 lea 0x4(%esp),%ecx 80483b8: 83 e4 f0 and $0xfffffff0,%esp 80483bb: ff 71 fc pushl -0x4(%ecx) 80483be: 55 push %ebp 80483bf: 89 e5 mov %esp,%ebp 80483c1: 51 push %ecx 80483c2: b8 00 00 00 00 mov $0x0,%eax 80483c7: 59 pop %ecx 80483c8: 5d pop %ebp 80483c9: 8d 61 fc lea -0x4(%ecx),%esp 80483cc: c3 ret 80483cd: 90 nop 80483ce: 90 nop 80483cf: 90 nop 
Sign up to request clarification or add additional context in comments.

5 Comments

For intel-syntax: objdump -Mintel -d. Or Agner Fog's objconv disassembler is the nicest one I've tried yet (see my answer). Adding numbered labels to branch-targets is really really nice.
Useful options: objdump -drwC -Mintel. -r shows relocations from the symbol table. -C demangles C++ names. -W avoids line wrapping for long instructions. If you use it often, this is handy: alias disas='objdump -drwC -Mintel'.
Add -S to display source code intermixed with disassembly. (As pointed in another answer.)
can i know is there a disassembler which will output only AT&A assembly? not all the addresses, binary encodings, etc...
@user135142: Agner Fog's objconv can output GAS .intel_syntax noprefix code that's ready to re-assemble; machine code hex only in comments. It doesn't support AT&T syntax, but it can produce a .s that's ready to assemble with GNU tools. (IDK if it works around the problems GAS .intel_syntax noprefix has with symbol names from int offset and int eax by putting them in quotes.)
66

An interesting alternative to objdump is gdb. You don't have to run the binary or have debuginfo.

$ gdb -q ./a.out Reading symbols from ./a.out...(no debugging symbols found)...done. (gdb) info functions All defined functions: Non-debugging symbols: 0x00000000004003a8 _init 0x00000000004003e0 __libc_start_main@plt 0x00000000004003f0 __gmon_start__@plt 0x0000000000400400 _start 0x0000000000400430 deregister_tm_clones 0x0000000000400460 register_tm_clones 0x00000000004004a0 __do_global_dtors_aux 0x00000000004004c0 frame_dummy 0x00000000004004f0 fce 0x00000000004004fb main 0x0000000000400510 __libc_csu_init 0x0000000000400580 __libc_csu_fini 0x0000000000400584 _fini (gdb) disassemble main Dump of assembler code for function main: 0x00000000004004fb <+0>: push %rbp 0x00000000004004fc <+1>: mov %rsp,%rbp 0x00000000004004ff <+4>: sub $0x10,%rsp 0x0000000000400503 <+8>: callq 0x4004f0 <fce> 0x0000000000400508 <+13>: mov %eax,-0x4(%rbp) 0x000000000040050b <+16>: mov -0x4(%rbp),%eax 0x000000000040050e <+19>: leaveq 0x000000000040050f <+20>: retq End of assembler dump. (gdb) disassemble fce Dump of assembler code for function fce: 0x00000000004004f0 <+0>: push %rbp 0x00000000004004f1 <+1>: mov %rsp,%rbp 0x00000000004004f4 <+4>: mov $0x2a,%eax 0x00000000004004f9 <+9>: pop %rbp 0x00000000004004fa <+10>: retq End of assembler dump. (gdb) 

With full debugging info it's even better.

(gdb) disassemble /m main Dump of assembler code for function main: 9 { 0x00000000004004fb <+0>: push %rbp 0x00000000004004fc <+1>: mov %rsp,%rbp 0x00000000004004ff <+4>: sub $0x10,%rsp 10 int x = fce (); 0x0000000000400503 <+8>: callq 0x4004f0 <fce> 0x0000000000400508 <+13>: mov %eax,-0x4(%rbp) 11 return x; 0x000000000040050b <+16>: mov -0x4(%rbp),%eax 12 } 0x000000000040050e <+19>: leaveq 0x000000000040050f <+20>: retq End of assembler dump. (gdb) 

objdump has a similar option (-S)

Comments

21

This answer is specific to x86. Portable tools that can disassemble AArch64, MIPS, or whatever machine code include objdump and llvm-objdump.


Agner Fog's disassembler, objconv, is quite nice. It will add comments to the disassembly output for performance problems (like the dreaded LCP stall from instructions with 16bit immediate constants, for example).

objconv -fyasm a.out /dev/stdout | less 

(It doesn't recognize - as shorthand for stdout, and defaults to outputting to a file of similar name to the input file, with .asm tacked on.)

It also adds branch targets to the code. Other disassemblers usually disassemble jump instructions with just a numeric destination, and don't put any marker at a branch target to help you find the top of loops and so on.

It also indicates NOPs more clearly than other disassemblers (making it clear when there's padding, rather than disassembling it as just another instruction.)

It's open source, and easy to compile for Linux. It can disassemble into NASM, YASM, MASM, or GNU (AT&T) syntax.

Sample output:

; Filling space: 0FH ; Filler type: Multi-byte NOP ; db 0FH, 1FH, 44H, 00H, 00H, 66H, 2EH, 0FH ; db 1FH, 84H, 00H, 00H, 00H, 00H, 00H ALIGN 16 foo: ; Function begin cmp rdi, 1 ; 00400620 _ 48: 83. FF, 01 jbe ?_026 ; 00400624 _ 0F 86, 00000084 mov r11d, 1 ; 0040062A _ 41: BB, 00000001 ?_020: mov r8, r11 ; 00400630 _ 4D: 89. D8 imul r8, r11 ; 00400633 _ 4D: 0F AF. C3 add r8, rdi ; 00400637 _ 49: 01. F8 cmp r8, 3 ; 0040063A _ 49: 83. F8, 03 jbe ?_029 ; 0040063E _ 0F 86, 00000097 mov esi, 1 ; 00400644 _ BE, 00000001 ; Filling space: 7H ; Filler type: Multi-byte NOP ; db 0FH, 1FH, 80H, 00H, 00H, 00H, 00H ALIGN 8 ?_021: add rsi, rsi ; 00400650 _ 48: 01. F6 mov rax, rsi ; 00400653 _ 48: 89. F0 imul rax, rsi ; 00400656 _ 48: 0F AF. C6 shl rax, 2 ; 0040065A _ 48: C1. E0, 02 cmp r8, rax ; 0040065E _ 49: 39. C0 jnc ?_021 ; 00400661 _ 73, ED lea rcx, [rsi+rsi] ; 00400663 _ 48: 8D. 0C 36 ... 

Note that this output is ready to be assembled back into an object file, so you can tweak the code at the asm source level, rather than with a hex-editor on the machine code. (So you aren't limited to keeping things the same size.) With no changes, the result should be near-identical. It might not be, though, since disassembly of stuff like

 (from /lib/x86_64-linux-gnu/libc.so.6) SECTION .plt align=16 execute ; section number 11, code ?_00001:; Local function push qword [rel ?_37996] ; 0001F420 _ FF. 35, 003A4BE2(rel) jmp near [rel ?_37997] ; 0001F426 _ FF. 25, 003A4BE4(rel) ... ALIGN 8 ?_00002:jmp near [rel ?_37998] ; 0001F430 _ FF. 25, 003A4BE2(rel) ; Note: Immediate operand could be made smaller by sign extension push 11 ; 0001F436 _ 68, 0000000B ; Note: Immediate operand could be made smaller by sign extension jmp ?_00001 ; 0001F43B _ E9, FFFFFFE0 

doesn't have anything in the source to make sure it assembles to the longer encoding that leaves room for relocations to rewrite it with a 32bit offset.


If you don't want to install it objconv, GNU binutils objdump -drwC -Mintel is very usable, and will already be installed if you have a normal Linux gcc setup. I use alias disas='objdump -drwC -Mintel' on my system. (-w is no line-wrapping, -C is demangle, -r prints relocations in object files.)

llvm-objdump -d also works, and can disassemble for a variety of architectures from a single binary. (Unlike GNU objdump where you'd need a separate per arch, like aarch64-linux-gnu-objdump -d.) Similarly, clang -O3 -target mips -c or clang -O3 -target riscv32 -c or whatever are useful to compile for architectures you're interested in, but not interested enough to bother installing a cross-compiler. (https://godbolt.org/ Compiler Explorer is also a useful resource for that; see How to remove "noise" from GCC/clang assembly output? for more about it and writing small functions that compile to interesting asm.)

Comments

6

there's also ndisasm, which has some quirks, but can be more useful if you use nasm. I agree with Michael Mrozek that objdump is probably best.

[later] you might also want to check out Albert van der Horst's ciasdis: http://home.hccnet.nl/a.w.m.van.der.horst/forthassembler.html. it can be hard to understand, but has some interesting features you won't likely find anywhere else.

1 Comment

In particular: home.hccnet.nl/a.w.m.van.der.horst/ciasdis.html contains under "latest developments" a debian package that you can install easily. With proper instructions (it does scripting) it will generate a source file that will reassemble again to the exact same binary. I'm not aware of any package that can do that. It may be hard to use from the instructions, I intend to publish in github with extensive examples.
4

Use IDA Pro and the Decompiler.

4 Comments

IDA seems a bit overkill for this, especially considering it's rather expensive
the free version is not available for Linux, only the limited demo version. (too bad because, on windows, that's the best disassembler i have ever used)
IDA is good but the problem of IDA is you get lazy if you used for small tasks.. gdb does the job for most of everything, gdb easier? no, but possible.
IDA is proprietary software, it doesn't respect the user's freedom. It contains DRM which restricts the user from using many features. Moreover, that's a paid software. See gnu.org/proprietary/proprietary.html.
3

You might find ODA useful. It's a web-based disassembler that supports tons of architectures.

http://onlinedisassembler.com/

1 Comment

great idea. getting Server Error (500) to onlinedisassembler.com/odaweb - hope it's transient.
3

You can come pretty damn close (but no cigar) to generating assembly that will reassemble, if that's what you are intending to do, using this rather crude and tediously long pipeline trick (replace /bin/bash with the file you intend to disassemble and bash.S with what you intend to send the output to):

objdump --no-show-raw-insn -Matt,att-mnemonic -Dz /bin/bash | grep -v "file format" | grep -v "(bad)" | sed '1,4d' | cut -d' ' -f2- | cut -d '<' -f2 | tr -d '>' | cut -f2- | sed -e "s/of\ section/#Disassembly\ of\ section/" | grep -v "\.\.\." > bash.S 

Note how long this is, however. I really wish there was a better way (or, for that matter, a disassembler capable of outputting code that an assembler will recognize), but unfortunately there isn't.

1 Comment

Wow! This is fantastic. Btw, regarding your problem, why don't you use an alias for it to skip typing this huge command?
2

I don’t think there can be too many answers to this question as everyone needs disassembly for different purposes, which necessitates different goals and different formatting.

For my own use-case, I’ve had the most success piping llvm-objdump into additional formatting commands:

(echo .intel_syntax noprefix;llvm-objdump 'sseAddBlocks.o' --no-show-raw-insn --no-leading-addr -d -C -Mintel --symbolize-operands -j.text|tail +6)|sed -E 's/[ ]*<(L[0-9]*)>/.\1/g; s/^<([^>]*)>:/.globl \1\n\1:/; s/^ [ \t]*/ /'|expand -t8 
  • Change 'sseAddBlocks.o' to the path/name of your binary
  • To get colored output, add: --disassembler-color=on
  • To get raw hex bytes, remove: --no-show-raw-insn
  • To get hex offsets, remove --no-leading-addr
  • To get mangled C++ names, remove -C
  • To get ALL executable sections, remove: -j.text
.intel_syntax noprefix .globl sseAddBlocks sseAddBlocks: shl rdx, 0x2 je .L0 shl rdx, 0x2 xor eax, eax nop word ptr cs:[rax + rax] nop word ptr [rax + rax] .L1: movdqa xmm0, xmmword ptr [rdi + rax] paddd xmm0, xmmword ptr [rsi + rax] movaps xmmword ptr [rdi + rax], xmm0 add rax, 0x10 cmp rdx, rax jne .L1 .L0: ret 

Infact, the above disassembly compiles back to an object file without error—as sse-add-blk-2.s -o sse-add-blk-2.o—however dont expect this to work for any on larger or executable programs: all the elf metadata, debug info, relocations, data/strings, etc was irrevocably lost.

5 Comments

Beware that .intel_syntax noprefix mode isn't as robust as AT&T. It will break if you have a global variable named int eax for example. (You can use "eax" with quotes to make it the symbol instead of register name, but gcc -masm=intel for example doesn't do that. Probably llvm-objdump doesn't either.)
This is cool; I didn't know of a disassembler other than Agner Fog's objconv (x86-only) that can invent labels for branch targets instead of just showing numeric addresses. --symbolize-operands is a key part of this answer, you should mention it in a bullet point. And BTW, the default for llvm-objdump is to only disassemble code sections, so that includes .init and .fini as well as .text, plus any other sections the compiler invents when optimizing hot / cold functions or multiversions or whatever. But yeah, to not see CRT startup code, probably best to look at just .text.
Hrm, --symbolize-operands doesn't work on AArch64, it doesn't invent new <L0> labels for targets that aren't symbols in the symbol-table. The same llvm-objdump -drwC --symbolize-operands command does work on x86-64, with or without --no-show-raw-insn --no-leading-addr
@PeterCordes Three things: first, the command does not produce a functional assembly source re-assemblable back to the binary by any stretch of the imagination, e.x. data/strings and ELF metadata are discarded. A global named eax is thus inconsequential to worry about. Second, its the s/[ ]*<(L[0-9]*)>/.\1/g; replacement in sed that inverts the labels for branches. Its quite a stupid hack but I I can't recall seeing < or > in x86_64 intel assembly outside data strings and comments, so it should be rock solid(?). Third, check clang -v on AArch64; if <18, try updating your LLVM
Yeah, that sed to discard <> around the invented labels should be safe on llvm-objdump output. Re: AArch64, I'm using 18.1 llvm-objdump --version 18.1.8 on x86-64 Arch GNU/Linux; which I haven't updated for several months. :/ And yeah, fair point about gotchas like int eax not being a problem since this isn't suitable for turning whole programs or libraries into source that can be re-assembled; just for single asm functions as a starting point for tweaking the asm to see if the compiler could be doing a better job, for example. (For just human viewing, the <> aren't a problem)
1

ht editor can disassemble binaries in many formats. It is similar to Hiew, but open source.

To disassemble, open a binary, then press F6 and then select elf/image.

Comments

1

Let's say that you have:

#include <iostream> double foo(double x) { asm("# MyTag BEGIN"); // <- asm comment, // used later to locate piece of code double y = 2 * x + 1; asm("# MyTag END"); return y; } int main() { std::cout << foo(2); } 

To get assembly code using gcc you can do:

 g++ prog.cpp -c -S -o - -masm=intel | c++filt | grep -vE '\s+\.' 

c++filt demangles symbols

grep -vE '\s+\.' removes some useless information

Now if you want to visualize the tagged part, simply use:

g++ prog.cpp -c -S -o - -masm=intel | c++filt | grep -vE '\s+\.' | grep "MyTag BEGIN" -A 20 

With my computer I get:

 # MyTag BEGIN # 0 "" 2 #NO_APP movsd xmm0, QWORD PTR -24[rbp] movapd xmm1, xmm0 addsd xmm1, xmm0 addsd xmm0, xmm1 movsd QWORD PTR -8[rbp], xmm0 #APP # 9 "poub.cpp" 1 # MyTag END # 0 "" 2 #NO_APP movsd xmm0, QWORD PTR -8[rbp] pop rbp ret .LFE1814: main: .LFB1815: push rbp mov rbp, rsp 

A more friendly approach is to use: Compiler Explorer

1 Comment

This is only reliable with optimization disabled, otherwise parts of the operations inside the region could optimize into stuff outside, or be optimized away. So you can only see the clunky -O0 asm.
-2

Use ghidra: https://ghidra-sre.org/. It is already installed on Kali Linux.

1 Comment

Why is this comment downvoted?
-2

Use: gcc -S ProgramName.c

Example:

#include <stdio.h> int myFunc(int x, int y) { char e = 'A'; printf("%c, %d, %d\n", e, x, y); return 1; } int main() { int z = myFunc(5, 7); return 0; } 

Makes:

 .file "temp.c" .text .section .rodata .LC0: .string "%c, %d, %d\n" .text .globl myFunc .type myFunc, @function myFunc: .LFB0: .cfi_startproc endbr64 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $32, %rsp movl %edi, -20(%rbp) movl %esi, -24(%rbp) movb $65, -1(%rbp) movsbl -1(%rbp), %eax movl -24(%rbp), %ecx movl -20(%rbp), %edx movl %eax, %esi leaq .LC0(%rip), %rax movq %rax, %rdi movl $0, %eax call printf@PLT movl $1, %eax leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size myFunc, .-myFunc .globl main .type main, @function main: .LFB1: .cfi_startproc endbr64 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $16, %rsp movl $7, %esi movl $5, %edi call myFunc movl %eax, -4(%rbp) movl $0, %eax leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE1: .size main, .-main .ident "GCC: (Ubuntu 12.3.0-1ubuntu1~23.04) 12.3.0" .section .note.GNU-stack,"",@progbits .section .note.gnu.property,"a" .align 8 .long 1f - 0f .long 4f - 1f .long 5 0: .string "GNU" 1: .align 8 .long 0xc0000002 .long 3f - 2f 2: .long 0x3 3: .align 8 4: 

2 Comments

See How to remove "noise" from GCC/clang assembly output? re: removing / reducing noise, like the .cfi directives, and enabling optimization so the assembly code is only doing what's necessary to implement the visible behaviour of the C functions. (So you should write functions that take args and return a value computed from them to see interesting asm.)
The question is asking for disassembly (of an object or executable file), not assembly (of a source file).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.