Starting from an answer of mine about the "real" entrypoint of an ELF executable on Linux and "raw" syscalls, we can strip it down to
bits 64 global _start _start: mov di,42 ; only the low byte of the exit code is kept, ; so we can use di instead of the full edi/rdi xor eax,eax mov al,60 ; shorter than mov eax,60 syscall ; perform the syscall
I don't think you can get it to be any smaller without going out of specs - in particular, the psABI doesn't guarantee anything about the state of eax. This gets assembled to precisely 10 bytes (as opposed to the 7 bytes of the 32 bit payload):
66 bf 2a 00 31 c0 b0 3c 0f 05
The straightforward way (assemble with nasm, link with ld) produces me a 352 bytes executable.
The first "real" transformation he does is building the ELF "by hand"; doing this (with some modifications, as the ELF header for x86_64 is a bit bigger)
bits 64 org 0x08048000 ehdr: ; Elf64_Ehdr db 0x7F, "ELF", 2, 1, 1, 0 ; e_ident times 8 db 0 dw 2 ; e_type dw 62 ; e_machine dd 1 ; e_version dq _start ; e_entry dq phdr - $$ ; e_phoff dq 0 ; e_shoff dd 0 ; e_flags dw ehdrsize ; e_ehsize dw phdrsize ; e_phentsize dw 1 ; e_phnum dw 0 ; e_shentsize dw 0 ; e_shnum dw 0 ; e_shstrndx ehdrsize equ $ - ehdr phdr: ; Elf64_Phdr dd 1 ; p_type dd 5 ; p_flags dq 0 ; p_offset dq $$ ; p_vaddr dq $$ ; p_paddr dq filesize ; p_filesz dq filesize ; p_memsz dq 0x1000 ; p_align phdrsize equ $ - phdr _start: mov di,42 ; only the low byte of the exit code is kept, ; so we can use di instead of the full edi/rdi xor eax,eax mov al,60 ; shorter than mov eax,60 syscall ; perform the syscall filesize equ $ - $$
we get down to 130 bytes. This is a tad bigger than the 91 bytes executable, but it comes from the fact that several fields become 64 bits instead of 32.
We can then apply some tricks similar to his; the partial overlap of phdr and ehdr can be done, although the order of fields in phdr is different, and we have to overlap p_flags with e_shnum (which however should be ignored due to e_shentsize being 0).
Moving the code inside the header is slightly more difficult, as it's 3 bytes larger, but that part of header is just as big as in the 32 bit case. We overcome this by starting 2 bytes earlier, overwriting the padding byte (ok) and the ABI version field (not ok, but still works).
So, we reach:
bits 64 org 0x08048000 ehdr: ; Elf64_Ehdr db 0x7F, "ELF", 2, 1, ; e_ident _start: mov di,42 ; only the low byte of the exit code is kept, ; so we can use di instead of the full edi/rdi xor eax,eax mov al,60 ; shorter than mov eax,60 syscall ; perform the syscall dw 2 ; e_type dw 62 ; e_machine dd 1 ; e_version dq _start ; e_entry dq phdr - $$ ; e_phoff dq 0 ; e_shoff dd 0 ; e_flags dw ehdrsize ; e_ehsize dw phdrsize ; e_phentsize phdr: ; Elf64_Phdr dw 1 ; e_phnum p_type dw 0 ; e_shentsize dw 5 ; e_shnum p_flags dw 0 ; e_shstrndx ehdrsize equ $ - ehdr dq 0 ; p_offset dq $$ ; p_vaddr dq $$ ; p_paddr dq filesize ; p_filesz dq filesize ; p_memsz dq 0x1000 ; p_align phdrsize equ $ - phdr filesize equ $ - $$
which is 112 bytes long.
Here I stop for the moment, as I don't have much time for this right now. You now have the basic layout with the relevant modifications for 64 bit, so you just have to experiment with more audacious overlaps
int 0x80system calls in your 64-bit executable? If so, your probably don't need to change much. I know there's some overlap of ELF header fields being interpreted as part of the machine code, so some change might be needed for ELF64.