Using a 16bit counter is at least a waste of bytes (operand-size prefixes), and can cause slowdowns. Using mov reg,0 is badmov reg,0 is bad. Put the the conditional branch at the bottom of the loop whenever possible.
There's no compiler, so if you want good code, you have to write it yourself. You can't just hope the compiler will do something good with i%3 in a loop (which it doesn't anyway, for most compilersfor most compilers).
Inline
buzz_or_number, and maybe turn it into a loop (FIZZMOD-1 iterations)Besides that, it could probably branch less, and other minor improvements. This is sort of a version 1.1: working, tested, with some comments and observations added while writing up this answer, but not actually improving the code much from what I initially decided was good enough to see if it worked.
Make it more flexible by writing a cleanup loop (or assembler macros) for the last
LASTCOUNT % FIZZMODlines, instead of assuming that it's 1 line. Cleanup code is the downside to unrolling.I used a
divby 10used adivby 10 to convert the counter to a string. A better implementation would use a multiplicative inverse, like compilers generate for small constant divisors (implemented in this case with LEA).
My first version of this almost worked the first time (after fixing a couple syntax errors so it would assemble, and a silly crash that took a couple minutes to debug IIRC): It printed fizz\nbuzz\n in the fizzbuzz\n case, and it reversed the digits. I keep forgettingI keep forgetting that digit-strings need to be stored with the most-significant digit first, not like the bytes in a little-endian binary integer.