41

I'd like to write a very small proof-of-concept JIT compiler for a toy language processor I've written (purely academic), but I'm having some trouble in the middle-altitudes of design. Conceptually, I'm familiar with how JIT works - you compile bytecode into (machine or assembly?) code to run. At the nuts-and-bolts level however, I'm not quite gripping how you actually go about doing that.

My (very "newb") knee-jerk reaction, since I haven't the first clue where to start, would be to try something like the following:

  1. mmap() a block of memory, setting access to PROT_EXEC
  2. write the native code into the block
  3. store the current registers (stack pointer, et al.) someplace cozy
  4. modify the current registers to point into the native code block in the mapped region
  5. the native code would now get executed by the machine
  6. restore the previous registers

Is that even close to a/the correct algorithm? I've tried perusing different projects that I know have JIT compilers to study (such as V8) but these codebases turn out to be difficult to consume because of their size, and I've little idea where to start looking.

8
  • 9
    You can probably simplify things further: you can often just take the starting address of your code within the mmap'ed block and cast it to a function pointer. In that case, the code would need to save and restore its own registers and such. You would want to look at the calling conventions in your platforms ABI (Application Binary Interface) for exactly what you need to save (and how to get arguments from C code, call C functions, etc.). Commented Feb 6, 2011 at 6:54
  • 1
    @Jeremiah Willcock: It seems to me like that's roughly the technique demonstrated by @Shelwien below, am I correct? Commented Feb 6, 2011 at 16:27
  • 3
    mmap to PROT_EXEC will probably not work. I don't believe current versions of Linux allow any memory to be both writable and executable at the same time. You need to map it writable, write it, then map it executable. Or so I believe. Commented Feb 7, 2011 at 23:13
  • 2
    @Chris: I guess it works for you. I know it didn't work for me. It might have been a Gentoo secure kernel. You might run into trouble on some systems. Commented Feb 8, 2011 at 0:04
  • 4
    Not that you're targeting non-x86, but beware that self modifying code (or on-the-fly generated code) requires explicit cache synchronization on other platforms. It's pretty much just x86 which does it transparently (which means loads of silicon). Just call msync() on the buffer after finishing writing and before executing it. Commented Feb 8, 2011 at 2:35

8 Answers 8

34

Not sure about linux, but this works on x86/windows.
Update: http://codepad.org/sQoF6kR8

#include <stdio.h> #include <windows.h> typedef unsigned char byte; int arg1; int arg2; int res1; typedef void (*pfunc)(void); union funcptr { pfunc x; byte* y; }; int main( void ) { byte* buf = (byte*)VirtualAllocEx( GetCurrentProcess(), 0, 1<<16, MEM_COMMIT, PAGE_EXECUTE_READWRITE ); if( buf==0 ) return 0; byte* p = buf; *p++ = 0x50; // push eax *p++ = 0x52; // push edx *p++ = 0xA1; // mov eax, [arg2] (int*&)p[0] = &arg2; p+=sizeof(int*); *p++ = 0x92; // xchg edx,eax *p++ = 0xA1; // mov eax, [arg1] (int*&)p[0] = &arg1; p+=sizeof(int*); *p++ = 0xF7; *p++ = 0xEA; // imul edx *p++ = 0xA3; // mov [res1],eax (int*&)p[0] = &res1; p+=sizeof(int*); *p++ = 0x5A; // pop edx *p++ = 0x58; // pop eax *p++ = 0xC3; // ret funcptr func; func.y = buf; arg1 = 123; arg2 = 321; res1 = 0; func.x(); // call generated code printf( "arg1=%i arg2=%i arg1*arg2=%i func(arg1,arg2)=%i\n", arg1,arg2,arg1*arg2,res1 ); } 
Sign up to request clarification or add additional context in comments.

10 Comments

@Shelwien: It was most definitely not portable before, since most modern OS's would not accept execution of the stack.
Thanks for the great example! I very nearly didn't catch that funcptr was a union at first - after that it made perfect sense.
First version used a simple typecast there, like ((pfunc)(void*)buf)(); , but codepad said something about ISO C++ not allowing to cast random stuff to functions, so i had to replace it with a union.
What would be the modification(s) required to work on x64? I'm not an assembly expert, but I don't see anything that x64 wouldn't find reverse-compatible.
I meant that the generated code there is 32-bit, x64 code would be different. I can make a x64 version if necessary, but is it?
|
5

Youmay want to have a look at libjit which provides exactly the infrastructure you're looking for:

The libjit library implements just-in-time compilation functionality. Unlike other JITs, this one is designed to be independent of any particular virtual machine bytecode format or language.

http://freshmeat.net/projects/libjit

1 Comment

Interesting find. This may well be useful if I decide I ever actually want to implement a non-trivial JIT.
5

How to JIT - an introduction is a new article (from today!) that addresses some of these issues and describes the bigger picture as well.

Comments

4

The Android Dalvik JIT compiler might also be worth looking at. It is supposed to be fairly small and lean (not sure if this helps understanding it or makes things more complicated). It targets Linux as well.

If things are getting more serious, looking at LLVM might be a good choice as well.

The function pointer approach suggested by Jeremiah sounds good. You may want to use the caller's stack anyway and there will probably only be a few registers left (on x86) which you need to preserve or not touch. In this case, it is probably easiest if your compiled code (or the entry stub) saves them on the stack before proceeding. In the end, it all boils down to writing an assembler function and interfacing to it from C.

Comments

2

The answer depends on your compiler and where you put the code. See http://encode.ru/threads/1273-Just-In-Time-Compilation-Improvement-For-ZPAQ?p=24902&posted=1#post24902

Testing in 32 bit Vista, Visual C++ gives a DEP (data execution prevention) error whether the code is put on the stack, heap, or static memory. g++, Borland, and Mars can be made to work sometimes. Data accessed by the JIT code needs to be declared volatile.

Comments

1

In addition to the techniques suggested so far, it might be worthwhile to look into the thread creation functions. If you create a new thread, with the starting address set to your generated code, you know for sure that there are no old registers that need saving or restoring, and the OS handles the setup of the relevant registers for you. I.e you eliminate steps 3, 4 and 6 of your list.

1 Comment

The only thing I would hesitate on here is that with threads, you then have to deal with synchronization, et al. - otherwise (if synchronization can be ignored and/or deferred) that's a pretty clever idea.
1

Linux x86 mmap minimal example

Just to provide a Linux one with mmap. At runtime I inject into memory a function equivalent to:

int ing(int i) { return i + 1; } 

main.c

#define _XOPEN_SOURCE 700 #include <assert.h> #include <stddef.h> /* NULL */ #include <sys/mman.h> /* mmap, munmap */ union funcptr { int (*f)(int); unsigned char *bytes; }; int main(void) { unsigned char *buf = (unsigned char *)mmap(NULL, 4, PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); assert(buf != MAP_FAILED); unsigned char *p = buf; // return i + 1; // lea 0x1(%rdi),%eax *p++ = 0x8d; *p++ = 0x47; *p++ = 0x01; // ret *p++ = 0xC3; assert(((union funcptr){ .bytes = buf }).f(1) == 2); // Just to check if we can modify the code witout any explicit icache flushing. // return i + 2; // lea 0x1(%rdi),%eax buf[2] = 0x02; assert(((union funcptr){ .bytes = buf }).f(1) == 3); int ret = munmap(buf, 4); assert(!ret); } 

Compile and run:

gcc -ggdb3 -O3 -std=c99 -Wall -Wextra -pedantic -o main main.c ./main 

I use an union as in https://stackoverflow.com/a/4912662/895245 because the C standard apparently forbids casting non function pointers to function pointers: ISO C Void * and Function Pointers This is a bit paternalistic, and if you ignore the warning GCC can also just cast directly with:

assert(((int (*)(int))(buf))(1) == 2); 

The shell code was obtained by compiling a test file:

notmain.c

int inc(int i) { return i + 1; } 

with -O3 and disassembling it:

gcc -ggdb3 -O3 -std=c99 -Wall -Wextra -pedantic -c -o notmain.o notmain.c objdump -d notmain.o 

and the output contains:

0000000000000000 <inc>: 0: f3 0f 1e fa endbr64 4: 8d 47 01 lea 0x1(%rdi),%eax 7: c3 ret 

The endbr64 is a NOP/security feature, so we can ("un"safely) ignore it: What does the endbr64 instruction actually do?

Related:

Tested on Ubuntu 23.10 amd64.

6 Comments

ISO C still doesn't define the behaviour of type-punning non-function pointers to function-pointers, a union or memcpy is just hiding it from the compiler. Probably most readable if you do something like int (*fptr)(int) = (void*)buf; in C. GCC -Wall doesn't warn about that, nor about int (*fptr)(int) = reinterpret_cast<int (*)(int)>(buf); - godbolt.org/z/M9Kr6d9hv
GNU C does define the behaviour, but you have to use __builtin___clear_cache(buf, buf+3); to be sure the compiler won't do dead-store elimination. (Casting to function pointer and calling through it doesn't get treated as "used". In this case it doesn't get optimized away because it doesn't know how mmap(MAP_ANONYMOUS) works: it's a pointer to memory that could be globally reachable some other way, so has to be in sync before an opaque function call. (In case the called function reads those bytes as data. Reading them as code just happens to work).
Despite the name, __builtin___clear_cache doesn't actually do anything to cache on x86, only on ISAs without coherent I-cache. On x86 it just blocks dead-store elimination and similar reordering of stores. See How does __builtin___clear_cache work? / gcc.godbolt.org/z/GvKPzYvo5 - in a -zexecstack build on an older system where that uses READ_IMPLIES_EXEC, so malloc returns executable memory, you actually do need __builtin___clear_cache because GCC knows about malloc. Or maybe if you mprotect some stack mem?
godbolt.org/z/5671x3MYn is a better example, using a stack buffer with -zexecstack which still works on modern kernels. It shows dead store elimination leading to a jump to uninitialized stack space vs. with clear_cache a working program. See also execute binary machine code from C / How to get c code to execute hex machine code?
Also, #define _XOPEN_SOURCE 700 makes GCC / Glibc choose not to define MAP_ANONYMOUS when compiling as C on Godbolt (but works as C++). Leaving it out works, but perhaps there's a higher version number that's appropriate.
|
0

You may be interested in why the lucky stiff's Potion programming language. It's a small, incomplete language that features just-in-time compilation. Potion's small size makes it easier to understand. The repository includes a description of the language's internals (JIT content starts at heading "~ the jit ~").

The implementation is complicated by the fact it runs in the context of Potion's VM. Don't let this scare you off, though. It doesn't take long to see what he's up to. Basically, using a small set of VM opcodes allows some actions to be modeled as optimized assembly.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.