How does builtin_clear_cache work?

Question

Going through the gcc documentation, I stumbled into the builtin function __builtin___clear_cache.

— Built-in Function: void __builtin___clear_cache (char *begin, char *end) This function is used to flush the processor's instruction cache for the region of memory between begin inclusive and end exclusive. Some targets require that the instruction cache be flushed, after modifying memory containing code, in order to obtain deterministic behavior.

If the target does not require instruction cache flushes, __builtin___clear_cache has no effect. Otherwise either instructions are emitted in-line to clear the instruction cache or a call to the __clear_cache function in libgcc is made.

I find this interesting, but surprising. In many cases, a large number of the instructions for the current stack is stored in the L1 cache (instruction cache). So it would seem at first glance that this builtin could corrupt significantly the flow of our program, by making it wipe out the next instructions on the stack.

Does this instruction also repopulates the part of the stack that was in the L1 cache?

This seems unlikely. If it does not, then I suppose the onus is on the user to use the right begin and end arguments, so as to not corrupt our process. In practice, how could one find what the right begin and end to use?

Why exactly do you ask? What is your program actually doing? — Basile Starynkevitch
– Basile Starynkevitch, Commented Mar 2, 2016 at 8:30
Note that __builtin___clear_cache doesn't wipe out instructions, it flushes/clears the cache, a processor would re-populate it. — nos
– nos, Commented Mar 2, 2016 at 8:31
The problem is unclear, the questioner must have some misunderstanding about caches. — Changbin Du
– Changbin Du, Commented Oct 30, 2023 at 11:29

Basile Starynkevitch · Accepted Answer · 2016-03-02 08:29:21Z

6

It is just emitting some weird machine instruction[s] on target processors requiring them (x86 don't need that).

Think of __builtin___clear_cache as a "portable" (to GCC and compatible compilers) way to flush the instruction cache (e.g. in some JIT library).

In practice, how could one find what the right begin and end to use?

To be safe, I would use that on some page range (e.g. obtained with sysconf(_SC_PAGESIZE)....), so usually a 4Kbyte aligned memory range (multiple of 4Kbyte). Otherwise, you want some target specific trick to find the cache line width...

On Linux, you might read /proc/cpuinfo and use the cache_alignment & cache_size lines to get a more precise cache line size and alignment.

BTW, a code using __builtin__clear_cache is very likely to be (for other reasons) target machine specific, so it has or knows some machine parameters (and that should include cache size & alignment).

edited Mar 2, 2016 at 8:29

answered Mar 2, 2016 at 8:23

Basile Starynkevitch

231k18 gold badges323 silver badges578 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Peter Cordes Over a year ago

Turns out __builtin___clear_cache is needed on x86 as a compiler memory barrier, to make gcc realize that the "dead" stores were actually writing code, and need to happen before a cast to a function pointer. See codegolf.stackexchange.com/questions/160100/… for why I wrote godbolt.org/g/pGXn3B, which repeats dec eax M times then appends a ret. Without __builtin___clear_cache or a compiler memory barrier like asm("":::"memory");, the memset is optimized away so it's just call to the malloc return without storing first.

Peter Cordes Over a year ago

I think __builtin___clear_cache is supposed to work if you just use it on the exact range of bytes you wrote and now want to execute. If it needs to know about cache-line size, it will figure it out itself, right?

Joseph Sible-Reinstate Monica Over a year ago

@PeterCordes It looks like that's actually related to the use of malloc to allocate the memory, rather than to being on x86. In particular, if I replace malloc(M+1) with mmap(NULL, M+1, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);, then the dead-store optimization doesn't happen even without __builtin___clear_cache.

Peter Cordes Over a year ago

@Joseph no. Gcc knows about malloc (private memory nothing else has a pointer to). That enables the optimization where it can't prove it's safe with mmap. But happening to work != totally safe and correct.

Changbin Du Over a year ago

I don't think __builtin___clear_cache() has a memory barrier effect! It's not true.

Changbin Du · Accepted Answer · 2023-12-16 07:17:52Z

1

On x86, the __builtin___clear_cache() does nothing. (but note it's not declared as a pure function)

On aarch64, the __builtin___clear_cache() will be finally expanded as a function call to __aarch64_sync_cache_range() in libgcc. You can view the GCC code below to understand how it is implemented.

https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/aarch64/sync-cache.c#L31

And for Clang it is here:

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/clear_cache.c#L123

edited Dec 16, 2023 at 7:17

answered Nov 1, 2023 at 3:54

Changbin Du

8919 silver badges20 bronze badges

4 Comments

Peter Cordes Over a year ago

On x86 a call to __builtin___clear_cache compiles to zero asm instructions, but it does prevent dead-store elimination and similar effects which can be a correctness problem. (Especially if you allocate memory on the stack and it's executable because of -zexecstack, so GCC knows the array is private to this function invocation and that you're not passing a pointer to it outside of this function.) godbolt.org/z/5671x3MYn shows an example of a test program that fails on x86-64 without __builtin___clear_cache. Your answer is useful for AArch64, but wrong for x86-64.

Peter Cordes Over a year ago

See also execute binary machine code from C and How to get c code to execute hex machine code?

Changbin Du Over a year ago

As I pointed out in another question, the above effect is due to that __builtin___clear_cache() is not declared as a 'pure' function, even though it's empty.

Peter Cordes Over a year ago

It's a builtin that fully inlines. The compiler knows that it doesn't actually read or write anything, it just pretends that it does.

Collectives™ on Stack Overflow

How does builtin_clear_cache work?

2 Answers 2

5 Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

4 Comments

Linked

Related