x86-64 machine code (Linux executable w/ system calls), score 1508
274 bytes: 18 code bytes, 12 data bytes, and 244 zero-padding bytes to make a rel32 = 00 01 00 00 instead of some larger number in the lowest byte.
The natural character-set of x86 machine code is 1 binary byte. No argument can be made for word, unlike with a RISC with fixed-length instructions.
Hexdump of the .text section of the executable (actually from assembling the same thing into a flat binary for hexdump -C). I'm not counting the metadata bytes of the whole executable emitted by the linker.
00000000 8d 35 00 01 00 00 5f 04 0c 92 04 01 0f 05 04 30 |.5...._........0| 00000010 0f 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000100 00 00 00 00 00 00 48 65 6c 6c 30 20 57 30 72 6c |......Hell0 W0rl| 00000110 64 21 |d!| 00000112
Disassembly (objdump -d -Mintel).
0000000000400080 <_start>: # Entry point 400080: 8d 35 00 01 00 00 lea esi,[rip+0x100] # 400186 <msg> 0000000000400086 <_start.lea_ref>: 400086: 5f pop rdi # fd = argc = 1 400087: 04 0c add al,0xc 400089: 92 xchg edx,eax 40008a: 04 01 add al,0x1 40008c: 0f 05 syscall # write(argc=edi, msg=rsi, msglen=rdx) leaves RAX=msglen 40008e: 04 30 add al,0x30 400090: 0f 05 syscall # exit(edi=1) ... (a bunch of 00 bytes) 0000000000400186 <msg>: 48 65 6c 6c 30 20 57 30 72 6c 64 21 db "Hell0 W0rld!"
Note that Linux static executables start with all registers = 0 (except RSP which points to argc on the stack).
add al, imm8 has opcode 04, and is the lowest opcode that takes an immediate. Using that to turn a 0 into a number we want, and xchg-with-eax for another zero, is the best way I've found to construct small numbers under these scoring rules. push imm8/pop is somewhat worse, and mov reg,imm8 has a high opcode.
The opcode map at http://ref.x86asm.net/coder64.html was very useful for this.
Built with nasm -felf64 hell0.asm && ld -o hell0 hell0.o from this source: (Including a mess of commented out possibilities, many with their instruction bytes to remind me why I didn't use them.)
If MSGXOR is %defined to a non-zero value, link with ld -N (aka --omagic) to make .text read+write. In that case, the %if includes a loop that XORs every byte of the string. 0x60 and 0x64 are both equal. Subtraction with 0x30 doesn't save enough, and subtraction with larger values wraps around for too many bytes creating larger bytes.
BITS 64 ; for flat binary. %define MSGXOR 0x60 default rel _start: lea esi,[rel msg] ; small positive offset. Truncating the address to 32-bit is ok in a non-PIE ; mov esi, msg ; better than rsi, msg or strict qword .lea_ref: ; add al, 1 ; nr_write. 04 add al, imm8 is the lowest-opcode immediate instruction ;cmovnz edi, eax ; 0f 45 f8 RDI is the highest-numbered register, and mode=11 is killer in ModRM ; xchg edi, eax ; 9x ; mov edi, [rsp] ; 8b 3c 24 pop rdi ; 5f fd = argc=1 ; lea edx, [rax+msglen] ; 8d 50 0d add al, msglen xchg edx, eax ; len = msglen %if 0 lea edi, [rax] ; 8d 38 add edi, eax mov esi, eax %endif ; push "Hell" ; 68 48 65 6c 6c ; then would need to deal with 64-bit addresses ; add cl, msglen ; 80 c1 0c ; mov cl, msglen ; b1 0d %if MSGXOR != 0 add al, msglen-2 ; last byte is a '!' which grows when XORed with 0x60 .loop: xor byte [rsi+rax], MSGXOR sub eax, strict dword 1 ; longer immediate makes the backwards jcc farther from FF ;sub al, 1; 2c 01 jnc .loop ; }while(rax>=0); jnc has a lower opcode than jge or jns ; lea ecx, [.loop] ; at the top: 8d 0d 00 00 00 00 ; loop back with an indirect jump so the loop exit can jump forwards? ; push rcx ; 51 ; ret ; c3 ; jmp rcx ; ff e1 nope, doesn't look good. Just have to eat the nearly-FF rel8 ; loop .loop ; e2 f2 .break: ;;; AL = -1 after the loop add eax, strict dword 2 %else add al, 1 ; RAX = 1 = _NR_write %endif ; %else EAX = 0 still ; add al, 1 + !!MSGXOR ; RAX = 1 = _NR_write ; mov al, 1 syscall ; write(argc, msg, msglen) returns RAX=msglen ; xor al, 60 ^ msglen add al, 60-msglen ; or sub to wrap to 231 _NR_exit_group? ; or al, 60 ; mov al, 60 ; exit. ; mov eax, 60 syscall padbytes equ 0x100 - ($ - .lea_ref) times padbytes db 0 ; in the same section as the code so we can control the rel32 to have one non-zero byte of 01 X equ MSGXOR ;msg: db "Hell0 W0rld!" msg: db 'H'^X, 'e'^X, 'l'^X, 'l'^X, '0'^X, ' '^X, 'W'^X, '0'^X, 'r'^X, 'l'^X, 'd'^X, '!' ; last byte not XORed msglen equ $-msg ;ml: db msglen ;msg: db 0x28, 0x5, 0xc, 0xc, 0x50, 0x40, 0x37, 0x50, 0x12, 0xc, 0x4, 0x41 ; with ^0x60
Score counted with hexdump | awk with a custom format to dump bytes in decimal, and awk to add them up.
nasm hell0.asm -o hell0.bin && <hell0.bin hexdump -e '32/1 "%d " "\n"' | awk '{for(i=1; i<=NF; i++)tot+= $i;} END{printf "score = %#x = %d\n", tot, tot}'
With MSGXOR = 0x60, score = 1668, the loop does not pay for itself with this short message, especially with 0 digits instead of o lower-case ASCII. cmp al,imm8 is 3C, but cmp/jcc and counting up towards mslen-1 instead of materializing msglen and msglen-2 separately might help. But it wouldn't help enough; we're 160 score away from break-even.
# with MSGXOR = 0x60 0000000000400080 <_start>: 400080: 8d 35 00 01 00 00 lea esi,[rip+0x100] # 400186 <msg> 0000000000400086 <_start.lea_ref>: 400086: 5f pop rdi 400087: 04 0c add al,0xc 400089: 92 xchg edx,eax 40008a: 04 0a add al,0xa 000000000040008c <_start.loop>: 40008c: 80 34 06 60 xor BYTE PTR [rsi+rax*1],0x60 400090: 2d 01 00 00 00 sub eax,0x1 400095: 73 f5 jae 40008c <_start.loop> 0000000000400097 <_start.break>: 400097: 05 02 00 00 00 add eax,0x2 40009c: 0f 05 syscall 40009e: 04 30 add al,0x30 4000a0: 0f 05 syscall ... 0000000000400186 <msg>: 400186: 28 05 0c 0c 50 40 # not really instructions, deleted disassembly 40018c: 37 40018d: 50 40018e: 12 0c 04 400191: 21 .byte 0x21
I can't find a way to jump backwards that isn't horrible. Register-indirect jump sucks, and short displacements need a 0xF? byte. Padding with any non-zero byte to change the displacement costs at least as much as it reduces the rel8 displacement, and 00 00 is add [rax], al. Hmm, Possibly with a 256-byte-aligned address in RAX, we could pad with 00 00 without modifying memory? But xchg with EAX is a 9? byte and getting an address into RAX is costly.
But if we have a pointer in RAX, we can't use add al, 1 or sub al, 1 to loop. In 32-bit code we'd have 4? inc/dec, but not RIP-relative addressing. x86-64 inc/dec are terrible, using the ff /0 modrm encoding.
(I considered using a linker script to set the absolute virtual address to something even lower, but LEA's opcode is lower than mov r32, imm32. Hrm, for EAX there is 05 add eax, imm32 with an absolute address. So looping another reg with inc or dec might be viable. Or looping EAX with a pointer-compare, especially if I can make the absolute data address something like 00 00 01 00 just outside the low 64k where Linux disallows memory-mapping by default. But then I'd feel like I had to count the linker script or its resulting metadata. Or if I'm messing around with the metadata (ELF headers) in the executable, maybe have to count the whole thing instead of just the .text section.)
32-bit code needs int 0x80 for system calls, not 0f 05 syscall.