Is IDA pulling my leg - or can REX.W sometimes not be determined in static analysis?

Question

NB: I normally dabble with disassembly (i.e. mnemonics) and only ever look at the raw opcodes when I can't avoid it.

I have the following line of disassembly of a Windows x64 kernel mode driver, created by IDA Pro 7.1.180227:

xor edx, edx

Now I know for a fact that this is in preparation of passing the second parameter to a function via rdx. I also know that the intention of that code is to set said pointer argument to NULL.

The opcode is 33 D2. And cross-referencing that with the reference or looking at it in ODA yields the same as with IDA: xor edx, edx.

Now what itches me wrong with this disassembly is that rdx, as a superset of edx, is used elsewhere on that exact code path to store other pointers. So in theory the upper double-word of rdx could be "dirty".

And going by the fact that this is x64 code, I'd expect this to read xor rdx, rdx. Why is that not how it's presented in the disassembly?

Now I understand that, as per section 3.6.1 (Table 3-4) of the Intel SDM (05/2018) the REX.W Prefix of the opcode can affect the operand size.

For this opcode neither the operand-size (66h) nor the address-size (67h) prefix are present.

So going by the Intel SDM (section "XOR—Logical Exclusive OR") I should indeed be dealing with opcode 33 /r or instruction XOR r32, r/m32, confirming IDAs translation of the opcode. Referring to section 2.1.5 ("Addressing-Mode Encoding of ModR/M and SIB Bytes") of the Intel SDM gives us a clue as to how the operand (D2) is encoded and so gives us, from Table 2-2 ("32-Bit Addressing Forms with the ModR/M Byte"): EDX/DX/DL/MM2/XMM2 as operand.

Figures.

However, that would mean that the "dirty" upper double-word in rdx would not be zeroed out and thus a garbled/truncated pointer would end up being passed. Given this is kernel mode code, the consequences should be clear.

I just can't believe that the compiler would make such mistake. So what am I missing?

Could you include some related code? say, code using E/RDX, additional code manipulating it prior to XOR instruction, etc. I've recently encountered a similar issue where after some more thorough investigation the upper half of RDX was indeed always zero, making this an unexploitable bug by the developer. — NirIzr
– NirIzr, Commented Jul 5, 2018 at 14:24
duplicate of stackoverflow Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?, and see also What is the best way to set a register to zero in x86 assembly: xor, mov or and? which explains that 32-bit operand-size is always best (e.g. xor r10d, r10d is not shorter than xor r10,r10, but Silvermont only recognizes 32-bit operand size as a zeroing idiom!) — Peter Cordes
– Peter Cordes, Commented Jul 5, 2018 at 21:19

jakobbotsch · Accepted Answer · 2018-07-05 14:31:06Z

In x86-64, any operation that affects only the lower 32 bits of a register automatically zeros out the upper 32 bits.

The relevant part in the Intel Architecture manual is in Volume 1, 3.4.1.1, which states:

When in 64-bit mode, operand size determines the number of valid bits in the destination general-purpose register:

64-bit operands generate a 64-bit result in the destination general-purpose register.

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.

8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.

Thus both forms give the same result, and xor edx, edx is one byte shorter than xor rdx, rdx, so the compiler prefers it.

By the way, this also gives other intricacies. For example the NOP instruction is normally 0x90, which is actually xchg eax, eax. But in 64-bit mode this is not a NOP, because it zeros the upper 32 bits. So xchg eax, eax has different encodings in 64-bit and 32-bit mode. — jakobbotsch
– jakobbotsch, Commented Jul 5, 2018 at 14:41
I read about that somewhere, but I did not make the connection. — 0xC0000022L
– 0xC0000022L ♦, Commented Jul 5, 2018 at 15:22
@IgorSkochinsky Yes, that's what I meant. They had to find a new encoding for xchg eax, eax since that wouldn't be a NOP on x64. — jakobbotsch
– jakobbotsch, Commented Jul 5, 2018 at 18:28
@Joshua: Right, I think what Jakob meant to say is that assemblers can't use the 0x90 short form for xchg eax,eax in the asm source, and have to use the xchg r, r/m32 form with a ModRM byte (felixcloutier.com/x86/XCHG.html). But fun fact, 16 and 64-bit operand sizes can still use 0x90 with a 66 or REX.W prefix. NASM does assemble xchg ax,ax to 66 90 in 64-bit mode. — Peter Cordes
– Peter Cordes, Commented Jul 5, 2018 at 21:17
@PeterCordes Exactly. I phrased that badly, but we got there in the end! — jakobbotsch
– jakobbotsch, Commented Jul 6, 2018 at 10:53

Stack Exchange Network

Is IDA pulling my leg - or can REX.W sometimes not be determined in static analysis?

1 Answer 1

Hot Network Questions

Is IDA pulling my leg - or can REX.W sometimes not be determined in static analysis?

1 Answer 1

Related

Hot Network Questions