I am currently implementing a Python interpreter in Rust. Basically, I want it be able to run python compiled code, in a .pyc file. I will write the compiler is a separate project.
I am having huge trouble mapping the bytecode on the .pyc file to the corresponding optcode. Here is an example:
The original file, hello_world.py
def hello(): print("Hello World") I compile this file into .pyc by using the command:
python3 -m compileall hello_world.py I know that I can get the optcode mapping using the dis modudle, so my idea was to just read the .pyc file as a u8 vector in rust and then map the bytes. However, here is my problem. If I dissassemble the original python file with the dis module I get this:
1 0 LOAD_CONST 0 (<code object hello at 0x7f4a42392240, file "hello_world.py", line 1>) 2 LOAD_CONST 1 ('hello') 4 MAKE_FUNCTION 0 6 STORE_NAME 0 (hello) 8 LOAD_CONST 2 (None) 10 RETURN_VALUE Disassembly of <code object hello at 0x7f4a42392240, file "hello_world.py", line 1>: 2 0 LOAD_GLOBAL 0 (print) 2 LOAD_CONST 1 ('Hello World') 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 0 (None) 10 RETURN_VALUE But, if I do the hexdump, with the -C flag, on the .pyc file, I get this:
00000000 55 0d 0d 0a 00 00 00 00 bc 91 40 62 25 00 00 00 |U.........@b%...| 00000010 e3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000020 00 02 00 00 00 40 00 00 00 73 0c 00 00 00 64 00 |[email protected].| 00000030 64 01 84 00 5a 00 64 02 53 00 29 03 63 00 00 00 |d...Z.d.S.).c...| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00 |................| 00000050 00 43 00 00 00 73 0c 00 00 00 74 00 64 01 83 01 |.C...s....t.d...| 00000060 01 00 64 00 53 00 29 02 4e 7a 0b 48 65 6c 6c 6f |..d.S.).Nz.Hello| 00000070 20 57 6f 72 6c 64 29 01 da 05 70 72 69 6e 74 a9 | World)...print.| 00000080 00 72 02 00 00 00 72 02 00 00 00 fa 0e 68 65 6c |.r....r......hel| 00000090 6c 6f 5f 77 6f 72 6c 64 2e 70 79 da 05 68 65 6c |lo_world.py..hel| 000000a0 6c 6f 01 00 00 00 73 02 00 00 00 00 01 72 04 00 |lo....s......r..| 000000b0 00 00 4e 29 01 72 04 00 00 00 72 02 00 00 00 72 |..N).r....r....r| 000000c0 02 00 00 00 72 02 00 00 00 72 03 00 00 00 da 08 |....r....r......| 000000d0 3c 6d 6f 64 75 6c 65 3e 01 00 00 00 f3 00 00 00 |<module>........| 000000e0 00 |.| 000000e1 the dis module disassembler tells me that this code has 12 optcode instructions, so I as expecting the .pyc file to have 12 * 2 bytes (Because it's two bytes per optcode). However, it seems that the .pyc file has 200+ bytes, and that is making me really confused.
How can I manage to get the optcodes from the .pyc file, without using the python modules?
Thank you!