6
\$\begingroup\$

This program counts down from 100 to 1 and:

  • If the current number is a multiple of 3 it prints "Fizz" instead of the number
  • If it is a multiple of 5 it prints "Buzz" instead of the number
  • If it is a multiple of 3 and 5 it prints "FizzBuzz" instead of the number

format PE console entry main include 'macro/import32.inc' section '.rdata' data readable msg db '%d',13,10, 0 fizz db 'Fizz', 13, 10, 0 buzz db 'Buzz', 13, 10, 0 p db 'pause>nul', 0 fizzbuzz db 'FizzBuzz', 13, 10, 0 section '.data' data readable writeable vdiv_by_3 dd 0 vdiv_by_5 dd 0 main: push ebp mov ebp, esp mov ecx, 100 load_3: mov eax, ecx mov ebx, 3 xor edx, edx div ebx mov ebx, edx mov [vdiv_by_3], ebx ; Store the remainder in div_by_3 load_5: ; Now check if its divisible by 5 mov eax, ecx mov ebx, 5 xor edx, edx div ebx mov [vdiv_by_5], edx ; Remainder in div_by_5 cmp edx, 0 ; Checking 5 jne check_3_not_5 check_3: mov eax, [vdiv_by_3] cmp eax, 0 je print_fizzbuzz check_3_not_5: mov eax, [vdiv_by_3] cmp eax, 0 je print_fizz check_5_not_3: mov eax, [vdiv_by_5] cmp eax, 0 je print_buzz print_num: ; Problem: This is printing 101 first, we need to start at 1 push ecx push msg call [printf] ; This call will mess with ecx so we have to store it add esp, 4 pop ecx ; Get the counter back into ecx jmp endme print_fizz: push ecx push fizz call [printf] add esp, 4 pop ecx jmp endme print_buzz: push ecx push buzz call [printf] add esp, 4 pop ecx jmp endme print_fizzbuzz: push ecx push fizzbuzz call [printf] add esp, 4 pop ecx jmp endme print_number: push ecx push msg call [printf] add esp, 4 pop ecx endme: dec ecx cmp ecx, 0 jne load_3 push p call [system] add esp, 4 push 0 call [exit] section '.idata' import data readable library msvcrt, 'msvcrt.dll' import msvcrt, \ printf, 'printf', \ system, 'system', \ exit, 'exit' 
\$\endgroup\$

1 Answer 1

6
\$\begingroup\$

Since you are writing FizzBuzz in assembler, you are obviously concerned about performance and code size.

For performance, div is one of the worst instructions. Since the divisibility repeats modulo 15, you could define some constants:

const divisible_by_3 = 0b1001001001001001 const divisible_by_5 = 0b1000010000100001 

To test the divisibility, have an extra register that is initialized with i mod 15 and adjusted whenever i changes. The basic idea is:

dec i dec i_mod_15 cmovs i_mod_15, 14 ; the maximum value mod 15 

You can also combine divisible_by_3 and divisible_by_5 into a bit vector of two-bit entries (divisible) and define a jump table based on that. To do the actual testing, use bit-shifting.

const divisible_by_3 = 0b_1_0_0_1_0_0_1_0_0_1_0_0_1_0_0_1 const divisible_by_5 = 0b1_0_0_0_0_1_0_0_0_0_1_0_0_0_0_1_ const divisible = 0b11000001001001000001100001000011 

Another idea is to use unroll the loop by using Duff's Device.

Right now, your code is a really boring, straight-forward translation of some higher-level language, probably C. In assembler, you have much more potential to compress the code (DRY principle). For example, you can jmp do_printf instead of writing push ecx; call printf; pop ecx several times.


… 90 minutes later …


Based on the above ideas, the code might look like this, tested and works.

format PE console entry main include 'macro/import32.inc' section '.rdata' data readable ; Each of these messages takes exactly 8 bytes, ; except for the last one. ; This is important for addressing them efficiently. messages db '%d', 13, 10, 0, 0, 0, 0 db 'Fizz', 13, 10, 0, 0 db 'Buzz', 13, 10, 0, 0 db 'FizzBuzz', 13, 10, 0 ; This bit mask selects one of the above messages ; to be printed. The counter is always passed to ; printf, and in 8/15 cases it is ignored. ; The uu entry at the end is unused. ; ; 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ; div3 mask = 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 u ; div5 mask = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 u div_mask = 11_00_00_01_00_10_01_00_00_01_10_00_01_00_00_00b section '.text' code readable main: push ebp mov ebp, esp mov ecx, 100 ; ecx = the counter mov eax, ecx xor edx, edx mov ebx, 15 div ebx ; edx = counter mod 15 mov ebx, div_mask ; ebx = bit mask for selecting the message rol ebx, 2 xchg edx, ecx ; Variable-width rotation is only possible rol ebx, cl ; with cl, therefore the temporary swap rol ebx, cl ; of edx and ecx. xchg edx, ecx ; .again: mov eax, ebx ; Select the message format for printf. and eax, 11b shl eax, 3 add eax, messages push ebx ; save registers before printf push ecx push edx push ecx ; actually call printf push eax call [printf] add esp, 8 pop edx ; restore registers after printf pop ecx pop ebx dec edx ; Adjust counter, counter mod 15 jns .normal ; and bit mask. add edx, 15 ror ebx, 2 ; Rotate one extra time since .normal: ; the bit mask has 16 entries. ror ebx, 2 dec ecx jnz .again xor eax, eax pop ebp ret section '.idata' import data readable library msvcrt, 'msvcrt.dll' import msvcrt, printf, 'printf', system, 'system' 

Some more things I took care of:

  • The code must not be in a writeable section.
  • Since the program can return from main, it should do so. To make that work, I had to add the pop ebp that corresponds to the push ebp at the very top; in your code that was missing.
  • I carefully avoided many branching statements, since they are poisonous to performance. See Why is it faster to process a sorted array.
  • The one remaining conditional behaves well in that it follows the jump in 14/15 cases, which is easily predictable.
  • Of course, using printf and the C stdio for output ruins all performance effects. But that's outside the scope for this little fun experiment.
  • All arguments to printf may be modified by printf. There's no guarantee that you get your ecx back at the point where you commented ; Get the counter back into ecx. To hide the counter from printf, you must push it once more to the stack. That's why I explicitly commented on this saving/calling/restoring in my code.

A nice thing is that you can play with the bit mask, which feels just like an ordinary configuration file.

\$\endgroup\$
4
  • \$\begingroup\$ Roland thanks for this very high quality answer. It is exactly what I am looking for - how a true “assembly programmer” can approach a problem. Can you provide a link about how I could do this without the c library which you says negates performance?? And any other resource you recommend for “thinking like an assembly programmer?” Thanks again. \$\endgroup\$ Commented Mar 11, 2018 at 22:30
  • 1
    \$\begingroup\$ Replacing printf with something more efficient is not trivial. The main takeaway here is that in I/O-bound code, there's no point in optimizing the remaining code. For further reading and thinking I suggest the book "Hacker's Delight", an essential piece of artwork that every assembler programmer should know. — By the way, thanks for writing your code so well and simple, it was a great foundation for putting my experiment on top of. I fear that my code is not as easy to modify or adapt. \$\endgroup\$ Commented Mar 12, 2018 at 2:13
  • 1
    \$\begingroup\$ The point is: if you replace printf, you have to call two different functions for the "number" and the "word" case, plus you have to do the I/O buffering yourself since system calls are slow. You could certainly call WriteFile instead of printf, but then you also have to implement the number formatting yourself. \$\endgroup\$ Commented Mar 12, 2018 at 2:16
  • \$\begingroup\$ @RolandIllig valuable answer :) \$\endgroup\$ Commented Mar 14, 2018 at 14:28

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.