Why does the assembly encoding of objdump vary?

Question

I was reading this article about Position Independent Code and I encountered this assembly listing of a function.

0000043c <ml_func>: 43c: 55 push ebp 43d: 89 e5 mov ebp,esp 43f: e8 16 00 00 00 call 45a <__i686.get_pc_thunk.cx> 444: 81 c1 b0 1b 00 00 add ecx,0x1bb0 44a: 8b 81 f0 ff ff ff mov eax,DWORD PTR [ecx-0x10] 450: 8b 00 mov eax,DWORD PTR [eax] 452: 03 45 08 add eax,DWORD PTR [ebp+0x8] 455: 03 45 0c add eax,DWORD PTR [ebp+0xc] 458: 5d pop ebp 459: c3 ret 0000045a <__i686.get_pc_thunk.cx>: 45a: 8b 0c 24 mov ecx,DWORD PTR [esp] 45d: c3 ret

However, on my machine (gcc-7.3.0, Ubuntu 18.04 x86_64), I got slightly different result below:

0000044d <ml_func>: 44d: 55 push %ebp 44e: 89 e5 mov %esp,%ebp 450: e8 29 00 00 00 call 47e <__x86.get_pc_thunk.ax> 455: 05 ab 1b 00 00 add $0x1bab,%eax 45a: 8b 90 f0 ff ff ff mov -0x10(%eax),%edx 460: 8b 0a mov (%edx),%ecx 462: 8b 55 08 mov 0x8(%ebp),%edx 465: 01 d1 add %edx,%ecx 467: 8b 90 f0 ff ff ff mov -0x10(%eax),%edx 46d: 89 0a mov %ecx,(%edx) 46f: 8b 80 f0 ff ff ff mov -0x10(%eax),%eax 475: 8b 10 mov (%eax),%edx 477: 8b 45 0c mov 0xc(%ebp),%eax 47a: 01 d0 add %edx,%eax 47c: 5d pop %ebp 47d: c3 ret

The main difference I found was that the semantic of mov instruction. In the upper listing, mov ebp,esp actually moves esp to ebp, while in the lower listing, mov %esp,%ebp does the same thing, but the order of operands are different.

This is quite confusing, even when I have to code hand-written assembly. To summarize, my questions are (1) why I got different assembly representations for the same instructions and (2) which one I should use, when writing assembly code (e.g. with __asm(:::);)

The top is in Intel syntax and the bottom one is in AT&T syntax. The AT&T syntax is different and the source and destination are reversed so it is source, destination. If you want Intel syntax with OBJDUMP use the option -Mintel — Michael Petch
– Michael Petch, Commented Mar 16, 2019 at 4:19
As for your second question if you compile with GCC and want Intel Syntax in inline assembly you can pas the -masm-intel option to GCC. The default is AT&T syntax. — Michael Petch
– Michael Petch, Commented Mar 16, 2019 at 4:23
a quick way to see intel vs AT&T is to look for lines with immediate values like add ecx,0x1bb0 or add $0x1bab,%eax that will establish the syntax and you can then flip it or not in your mind as you read it to whichever you think is sane. Which ordering is sane is on the order of religion and politics, very personal. — old_timer
– old_timer, Commented Mar 16, 2019 at 15:58
other clues as to the specific age or syntax used within the code (assembly language is defined by the assembler, the tool, not by some standard) is to look for the mips style percent sign on the registers, the mips style -0x10(%eax) syntax or the intel style DWORD PTR [eax] with brackets and not intel style as in intel vs AT&T but intel style in general independent of AT&T or not. Your first example is classic intel style intel syntax assembly language, the latter is gnu assembler style, gnu assembler is well known for mangling the syntax for all targets not just x86. — old_timer
– old_timer, Commented Mar 16, 2019 at 16:02

Peter Cordes · Accepted Answer · 2019-03-16 05:52:00Z

obdjump defaults to -Matt AT&T syntax (like your 2nd code block). See att vs. intel-syntax. The tag wikis have some info about the syntax differences: https://stackoverflow.com/tags/att/info vs. https://stackoverflow.com/tags/intel-syntax/info

Either syntax has the same limitations, imposed by what the machine itself can do, and what's encodeable in machine code. They're just different ways of expressing that in text.

Use objdump -d -Mintel for Intel syntax. I use alias disas='objdump -drwC -Mintel' in my .bashrc, so I can disas foo.o and get the format I want, with relocations printed (important for making sense of a non-linked .o), without line-wrapping for long instructions, and with C++ symbol names demangled.

In inline asm, you can use either syntax, as long as it matches what the compiler is expecting. The default is AT&T, and that's what I'd recommend using for compatibility with clang. Maybe there's a way, but clang doesn't work the same way as GCC with -masm=intel.

Also, AT&T is basically standard for GNU C inline asm on x86, and it means you don't need special build options for your code to work.

But you can use gcc -masm=intel to compile source files that use Intel syntax in their asm statements. This is fine for your own use if you don't care about clang.

If you're writing code for a header, you can make it portable between AT&T and Intel syntax using dialect alternatives, at least for GCC:

static inline void atomic_inc(volatile int *p) { // use __asm__ instead of asm in headers, so it works even with -std=c11 instead of gnu11 __asm__("lock {addl $1, %0 | add %0, 1}": "+m"(*p)); // TODO: flag output for return value? // maybe doesn't need to be asm volatile; compilers know that modifying pointed-to memory is a visible side-effect unless it's a local that fully optimizes away. // If you want this to work as a memory barrier, use a `"memory"` clobber to stop compile-time memory reordering. The lock prefix provides a runtime full barrier }

source+asm outputs for gcc/clang on the Godbolt compiler explorer.

With g++ -O3 (default or -masm=att), we get

atomic_inc(int volatile*): lock addl $1, (%rdi) # operand-size is from my explicit addl suffix ret

With g++ -O3 -masm=intel, we get

atomic_inc(int volatile*): lock add DWORD PTR [rdi], 1 # operand-size came from the %0 expansion ret

clang works with the AT&T version, but fails with -masm=intel (or the -mllvm --x86-asm-syntax=intel which that implies), because that apparently only applies to code emitted by LLVM, not for how the front-end fills in the asm template.

The clang error message is:

<source>:4:13: error: unknown use of instruction mnemonic without a size suffix __asm__("lock {addl $1, %0 | add %0, 1}": "+m"(*p)); ^ <inline asm>:1:2: note: instantiated into assembly here lock add (%rdi), 1 ^ 1 error generated.

It picked the "Intel" syntax alternative, but still filled in the template with an AT&T memory operand.

Update, clang 14 supports -masm=intel in a way compatible with GCC, treating asm statements as Intel syntax: How to set gcc or clang to use Intel syntax permanently for inline asm() statements?

Collectives™ on Stack Overflow

Why does the assembly encoding of objdump vary?

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related