4

I have a question about putting data (address table or other data) in the .text section under its function or put in .data section? For example, I have a function like this :

extern int i0(); extern int i1(); extern int i2(); extern int i3(); extern int i4(); extern int i5(); void fff(int x) { switch (x) { case 0: i0(); break; case 1: i1(); break; case 2: i2(); break; case 3: i3(); break; case 4: i4(); break; case 5: i5(); break; } } 

here in assembly, this is my code:

fff: cmp edi, 5 ja .L10 mov edi, edi xor eax, eax jmp [QWORD PTR .L4[0+rdi*8]] .L4: .quad .L9 .quad .L8 .quad .L7 .quad .L6 .quad .L5 .quad .L3 .L5: jmp i4 .L3: jmp i5 .L9: jmp i0 .L8: jmp i1 .L7: jmp i2 .L6: jmp i3 .L10: ret 

Here I have .L4 which holds the jump addresses ... where should I put this .L4 table ? Under the fff function or I have to put it in the .data section ? What about static data ? For example, I have 2 QWORD for a function, I must put it in that function, or I must put those QWORDs in the data section ? Why ? I know that there will be no problem if I put it in .data section or under its function, but I want to know which way is better?

2 Answers 2

5

Yes, you can put the table of pointers (.L4:) in .text section (if it won't be modified at run time) but I don't see a reason for double indirection to a set of jumps to external functions i0..i5. You can branch with an indirect near jump, which takes the destination address from a table of pointers to those external functions. The linker takes care of the completion of external addresses. Example in NASM/Intel syntax:

| | global fff | | extern i0,i1,i2,i3,i4,i5 |00000000:4883FF05 |fff: cmp rdi, 5 |00000004:773A | ja .L10 |00000006:FF24FD[10000000] | jmp [.L4+8*rdi] |0000000D:0F1F00 | align 8 ; For better performance. |00000010:[0000000000000000] |.L4: dq i0 |00000018:[0000000000000000] | dq i1 |00000020:[0000000000000000] | dq i2 |00000028:[0000000000000000] | dq i3 |00000030:[0000000000000000] | dq i4 |00000038:[0000000000000000] | dq i5 |00000040:C3 |.L10:ret 
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you ......
Your jump table solution doesn't work, because jmp [.L4+rdi+4*rdi] is an indirect jump through memory, so it treats the jmp instructions as the destination addresses. Even if it did work, it still has two jmp instructions. Instead, use a jump table like in the question, but put the actual destination in the table instead of a local label that jumps to the destination. In other words, change jmp in the jump table to .quad.
@prl Oops, thats true, thank you, fixed.
4

The .data section is usually writable, and you would not want your jump table to be accidentally or maliciously overwritten. So .data isn't the best place for it.

.text would be fine; it is normally read-only. It doesn't really matter whether it's near the function or not. Many systems have a .rodata section which is read-only and not executable, which would be even better; it would help catch bugs or attacks which accidentally or deliberately try to execute the bytes of the jump table.

7 Comments

So there is no reason (for example, large cache-line (in fact i don't know about cache-line ... i just heard it) or ...) for not putting data-table in the .text section, near the function ... right ? Because i always thought that .text section must not be huge ...
Most (all?) x86 CPUs have separate L1 instruction and data caches, so being near the function isn't relevant for L1 cache locality. Conceivably it could help with L2/L3, but that is large enough that if your function is called with any frequency, the jump table should stay hot anyway.
@HelloMachine You don't want to put writable data next to functions as that can cause caching issues. Read-only data is fine, but you get better cache utilisation if you put all data next to each other (e.g. into a dedicated .rodata section) instead of mixing data and code.
@HelloMachine: Why do Compilers put data inside .text(code) section of the PE and ELF files and how does the CPU distinguish between data and code? debunks that false premise; only obfuscators mix code and data. With separate L1d / L1i caches, and split iTLB/dTLB, it wastes space / coverage in caches. In the cold case it might get your data into L2 along with code fetch, speeding up the eventual L1d miss, but that sacrifices a whole page of dTLB coverage for those few bytes.
@HelloMachine: If you're optimizing, instead of .L5: jmp i4, just put i4 as the jmp table entry instead of .L5, so you're directly tailcalling.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.