0

I'm trying to get a hold of assembly, but there's one probably very simple thing I don't understand.

Consider this following simple example

long long * values = new long long[2]; values[0] = 10; values[1] = 20; int j = -1; values[j+2] = 15; // xxxxxxx 

Now, the last line (marked with xxxxxx) disassembles to:

000A6604 mov eax,dword ptr [j] 000A6607 mov ecx,dword ptr [values] 000A660A mov dword ptr [ecx+eax*8+10h],0Fh 

First question: What is actually stored in eax and ecx, is it the actual values (i.e. -1 for "j", and the two long long values 10 and 20 for "values"), or is it merely a memory address (e.g. someting like &p, &values) pointing to some place where the values are being stored?

Second question, I know what the third line is supposed to do, but I'm not quite sure why this actually works. So my understand is, it copies the value 0x0F into the specified memory location. The memory location is basically - the location of the first element stored in ecx - plus the size of long long in bytes (= 8) * the value of eax (which equals j, so -1) - plus the generic offset of 16 bytes (2 times the size of long long). What I don't get is: In this expression, ecx seems to be a memory address, while eax seems to be a value (-1). How is this possible? Seeing they were defined in pretty much the same way, shouldn't eax and ecx either both contain memory addresses, or both values?

Thanks.

4
  • 1
    A memory address and a value are both just bits. The only difference is in what those bits represent. Commented Jun 13, 2014 at 13:50
  • Regarding the first question: it's loading the values of j and values respectively. The value of values is in turn the address of a chunk of memory. Writing something like mov ecx, OFFSET values would be like taking the address of a pointer in C, which gives you a pointer to a pointer. Commented Jun 13, 2014 at 13:55
  • Notice that j and values have different types in your C code too. Commented Jun 13, 2014 at 13:59
  • Ahh of course, that makes sense. j = -1, but values = a memory address. I was confusing values with *values. Okay, i think this explains both questions, thank you very much! Commented Jun 13, 2014 at 13:59

1 Answer 1

3

eax and ecx are registers -- the first two instructions load those registers with the values used in the calculation, i.e. j and values (where values means the base address of the array by that name).

I know what the third line is supposed to do, but I'm not quite sure why this actually works

The instruction mov dword ptr [ecx+eax*8+10h],0Fh means move the value 0Fh (i.e. 15 decimal) into the location ecx+eax*8+10h. To figure that out, consider each piece:

  • ecx is the base address of the values array

  • eax is the value at j, i.e. -1

  • eax*8 is j converted to an offset in bytes -- the size of a long long is 8 bytes

  • eax*8+10h 10h is 16 decimal, i.e. 2*8, so this is j+2 converted to a byte offset

  • ecx+eax*8+10h adds that final offset to the base address of the array to determine the location in which to store the value 15

Sign up to request clarification or add additional context in comments.

6 Comments

Very good answer, thank you! One last question, how does it "know" j (or eax, respectively) is signed? The definition of j was "mov dword ptr [j],0FFFFFFFFh". So 0xFFFFFFFF equates to 4,294,967,295 if interpreted as an unsigned number (and indeed, that's what the watch window in VC is telling me). But I assume 4,294,967,295 * 8 would cause an overflow, or yield anything but not -8 as supposed to. So how does this work?!
0xFFFFFFFF * 8 is 0x7FFFFFFF8, which is indeed too large to fit in a 32-bit number (i.e. a pointer), so only the least significant 32 bits are used. That leaves us with 0xFFFFFFF8, i.e. -8 decimal. Add 2*8 to that and you get 0x00000008, which is the right amount to add to values to get the address of values[-1+2].
Great, got it. How does it know 0xFFFFFFF8 stands for -8, though, and not for 4,294,9672,88? E.g. signed int a = -1 = 0xFFFFFFFF, and unsigned int b = 4294967295 = 0xFFFFFFFF. But accessing values[a+1] and values[b+1] emits the same Assembly code; so how does it know whether then to access element 0, or element 4,294,967,296?
The processor doesn't "know" -- it's all just bits. The compiler (more specifically, the folks who wrote the compiler) knows how to use 2's complement arithmetic to implement negative numbers. You might have to play with some examples to really understand. First, make sure you know how to use 2's complement (i.e. invert all the bits and add 1). Example: 0x12345678 - 0x8 == 0x12345678 + 0xFFFFFFF8 == 0x112345670, which is just 0x12345670 if you're limited to the least significant 32 bits.
What seems to be missing is a 'mov dword ptr [ecx+eax*8+14h],00h', to zero out the upper half of values[j+2], since it's a long long (64 bit or qword).
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.