I'm trying to write as simple I/O library in x64 using linux syscalls
section .text strlen: xor rdx, rdx .loop: cmp [rsi + rdx], 0 je .exit inc rdx jmp .loop .exit: ret ; value in rdx puts: ; string passed through rsi mov rax, 1 call strlen syscall I'm trying to write as simple I/O library in x64 using linux syscalls
section .text strlen: xor rdx, rdx .loop: cmp [rsi + rdx], 0 je .exit inc rdx jmp .loop .exit: ret ; value in rdx puts: ; string passed through rsi mov rax, 1 call strlen syscall Your loop uses 2 jumps (je/jmp) on every iteration! Jumping is expensive, so a solution that needs only 1 jump (jne) will be more effective.
strlen: xor rdx, rdx dec rdx ; This compensates for the INC that is happening first. .next: inc rdx cmp byte [rsi + rdx], 0 jne .next ret Do keep things logically together. There's no point in setting RAX before the call to strlen.
puts: ; string passed through rsi call strlen ; Result is in RDX mov rax, 1 syscall fputs then have puts just mov rdi, 1 and call puts? \$\endgroup\$ The other review hit the most important parts, but there are a few more things to consider.
If the code is instead written like this:
; IN: rdi points to NUL-terminated string ; OUT: rax contains string length strlen: xor rax, rax dec rax .top: inc rax cmp byte [rdi + rax], 0 jne .top ret This would have the advantage of being callable from C.
Instead of having "magic numbers" littering the code, it's better to define named constants. For example the number 1 is used in two different ways; once for the WRITE syscall, and once for the stdout file handle. I'd recommend defining and using one named constant for each.
As you mention in a comment, the only difference between puts and fputs is the file handle. In this case, one could get both puts and fputs very cheaply like this:
puts: mov rdi, 1 ; fd for stdout fputs: call strlen mov rax, 1 ; WRITE syscall syscall ret Note that this uses your existing calling convention rather than the C calling convention.
You may find it useful to define some macros for common things like this:
%macro SYSTEM 1 mov rax, %1 syscall %endmacro WRITE: equ 1 SYSTEM WRITE This is a minor point, but you can save code space and a little time (indirectly, by taking less space) by replacing loads like these
mov rax, 1 ; WRITE syscall mov rdi, 1 ; fd for stdout with 32bit mov:
mov eax, 1 ; WRITE syscall mov edi, 1 ; fd for stdout Writes to 32bit registers are zero-extended to the corresponding 64bit register so they are equivalent.
For example mov rax, 1 might be encoded (depending on the assembler) as
48 c7 c0 01 00 00 00 While mov eax, 1 may be encoded as
b8 01 00 00 00 The b8+-type mov in its 64bit form has an imm64 which would take even more bytes, the assembler can choose the c7 form to avoid encoding a whole imm64 if the constant is small enough, but then unlike the b8+-form it needs a ModRM byte to encode the destination (the c0 byte) and a REX.W prefix is still needed to encode to 64bitness of the instruction, at least if the assembler is faithful to the form as written.
rax, so if there was some big value in it then it would stay big \$\endgroup\$ rax does turn out to have a larger value, it can also causs some odd partial register write performance impacts \$\endgroup\$