5

I am learning assembly and low-level programming itself and reading a book about it. It is said there that we can put any data inside the .text section of an elf file but of course we can't mutate it because of different permissions of pages/segments. But it was not told there, what was the reason for it, for having data inside .text section. I was also told by many C++ programmers that g++ compiler puts

static const char DATA[] = "SOME DATA"; 

inside the .text section too. I wonder, why not to put this data inside .rodata section, what is the purpose? And if .text is used, what to store in the .rodata then?

The main question is about this behaviour in long mode.

7
  • 1
    I tried it and g++ put DATA in .rodata. Commented Jun 26, 2018 at 12:04
  • 1
    But it is stored in .rodata. Commented Jun 26, 2018 at 12:04
  • 1
    "I was also told by many C++ programmers" [citation needed] Commented Jun 26, 2018 at 12:10
  • 2
    @VictorPolevoy No, I think your tags are fine. Commented Jun 26, 2018 at 12:16
  • 1
    Also, if you're writing for a system where code can be executed directly from ROM chips, like in some MCUs, or older (read retro) systems, you won't need to copy the data to RAM to use it. Commented Jun 26, 2018 at 12:36

2 Answers 2

13

Traditionally, read-only data was placed in the text section for two reasons:

  • the text section is not writable, so memory protection can catch accidental writes to read-only data and make your program crash instead
  • with a memory-management unit (MMU), multiple instances of the same process can share one copy of the text section (as its guaranteed to be the same in all instances of the program), saving memory

On ELF targets, this scheme was modified a bit. Read-only data is now placed in the new .rodata section which is like the .text section except it also cannot be executed, preventing certain attack vectors. The advantages remain.

Sign up to request clarification or add additional context in comments.

10 Comments

Also when building binaries for persistent memory chips (ROM/...), the ".data" are usually in volatile DRAM, which is damaged when power is lost, and often on embedded systems both ".text" and ".rodata" are effectively the same section. On some platforms also constants are interleaved between code directly to allow for simple relative addressing against instruction pointer, having them in ".rodata" could introduce extra pointer if it would be not at fixed relative offset from ".text", and some platforms like short offsets for encoding. (plus cache locality may boost performance).
I.e. the question ".rodata" vs ".text" is quite subtle, and mostly "because it has some minor advantages on modern platforms in terms of protection", but they are very similar... if the question would be "why read-only, why not .data and just initialize them", it would be much simpler and less subtle to answer that... :)
There does seem to be a r-- mapping as well as the r-x (text) and rw- (data) mappings in compiler output from throwing that code-golf hack into a file. But the string literal is in the same mapping as main, so it is executable. (I set a breakpoint and single-stepped). Oh, I think that's something else; the first 4 bytes of the page are 127 '\177' 69 'E' 76 'L' 70 'F', so it's probably some metadata. IDK if they could have put .rodata into this segment.
Update on this: a recent version of ld changed to linking .rodata into its own non-executable ELF segment, so const char code[] = { 0xc3 }; no longer works when cast to a function pointer, without -zexecstack. It did used to "just work" to put machine code in a const array or string literal.
Update 2: Recent Linux kernels (5.5 or so) changed the meaning of -z execstack to actually make only the stack executable, not READ_IMPLIES_EXEC. Fixing most of Unexpected exec permission from mmap when assembly files included in the project. See How to get c code to execute hex machine code? (including my answer for stuff like __attribute__((section(".text"))))
|
3

A lot of correct things were said here. I will make some additions and clarificatons.

  • The fact that we can put constant data in .text does not mean that we should. After all, instructions and data are just binary numbers.
  • It also does not mean that the modern compilers are (always) doing it.
  • The .rodata, .text and other sections are largely an implementation detail.
  • It is true, that the big chunks of const data are often stored in .rodata. However, in your case, a const static string, which is sufficiently small, may just get inlined into the instruction stream when used. The string itself, which is ought to be placed in .rodata, may then be optimized out, but its contents, being split over some instructions, will be de facto stored in .text.

1 Comment

Fun fact: ARM traditionally puts small constant data between functions ("literal pools") so they're reachable with PC-relative load instructions. But compilers usually just put a pointer to the real static data if it's bigger than a register, not a whole string. And yeah, compilers don't mix code and data at all on x86, although obfuscators might.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.