8

When encode instructioncmpw %ax -5 for x86-64, from Intel-instruction-set-reference-manual, I have two opcodes to choose:

3D iw CMP AX, imm16 I Valid Valid Compare imm16 with AX. 83 /7 ib CMP r/m16, imm8 MI Valid Valid Compare imm8 with r/m16. 

So there will be two encoding results:

66 3d fb ff ; this for opcode 3d 66 83 f8 fb ; this for opcode 83 

Then which one is better?

I tried some online-disassembler below

Both can disassemble to origin instruction. But why 6683fb00 also works and 663dfb doesn't.

9
  • There can be no "better" unless you say what's important to you. I can think of three dimensions: code size, execution speed, and compatibility/portability. Size seems to be the same, so it's not better there. There's probably more. What do you want to achieve? Commented Jun 3, 2016 at 10:06
  • 2
    Without looking into this too far, one instruction seems to compare AX (a 16-bit register) with a 16-bit value, whereas the other compares a different (16-bit) register with an 8-bit value. Commented Jun 3, 2016 at 10:07
  • In the second variant, the prefix isn't length-changing. Commented Jun 3, 2016 at 10:11
  • 5
    In this case don't use a 16-bit immediate value if you don't have to. There is quite a penalty for the Length Changing prefix in 64-bit code. The Intel optimization manual has a rule to avoid an LCP stall like this: Assembly/Compiler Coding Rule 21. (MH impact, MH generality) Favor generating code using imm8 or imm32 values instead of imm16 values. Commented Jun 3, 2016 at 11:49
  • 1
    @Neil that doesn't matter if he's using -5 as the operand though Commented Jun 3, 2016 at 12:37

1 Answer 1

8

Both encodings are the same length, so that doesn't help us decide.

However, as @Michael Petch commented, the imm16 encoding will cause an LCP stall in the decoders on Intel CPUs. (Because without the 66 operand-size prefix, it would be 3D imm32, so the operand-size prefix changes the length of the rest of the instruction. This is why it's called a Length-Changing-Prefix stall. AFAIK, you'd get the same stall in 16bit code for using a 32bit immediate.)

The imm8 encoding doesn't cause a problem on any microarchitecture I know of, so favour it. See Agner Fog's microarch.pdf, and other links from the tag wiki.

It can be worth using a longer instruction to avoid an LCP stall. (e.g. if you know the upper 16 bits of the register are zero or sign-extended, using 32bit operand size can avoid the LCP stall.)

Intel SnB-family CPUs have a uop cache, so instructions don't always have to be re-decoded before executing. Still, the uop cache is small, so it's worth it.

Of course, if you're tuning for AMD, then this isn't a factor. I forget if Atom and Silvermont decoders also have LCP stalls.


Re: part2:

663d is prefix+opcode for cmp ax, imm16. 663dfb doesn't "work" because it consumes the first byte of the following instruction. When the decoder see 66 3D, it grabs the next 2 bytes from the instruction stream as the immediate.

Sign up to request clarification or add additional context in comments.

1 Comment

Awesome answer for the extra reference!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.