Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link
URL Rewriter Bot
URL Rewriter Bot

Using a 16bit counter is at least a waste of bytes (operand-size prefixes), and can cause slowdowns. Using mov reg,0 is badmov reg,0 is bad. Put the the conditional branch at the bottom of the loop whenever possible.

There's no compiler, so if you want good code, you have to write it yourself. You can't just hope the compiler will do something good with i%3 in a loop (which it doesn't anyway, for most compilersfor most compilers).

  • Inline buzz_or_number, and maybe turn it into a loop (FIZZMOD-1 iterations)

  • Besides that, it could probably branch less, and other minor improvements. This is sort of a version 1.1: working, tested, with some comments and observations added while writing up this answer, but not actually improving the code much from what I initially decided was good enough to see if it worked.

  • Make it more flexible by writing a cleanup loop (or assembler macros) for the last LASTCOUNT % FIZZMOD lines, instead of assuming that it's 1 line. Cleanup code is the downside to unrolling.

  • I used a div by 10used a div by 10 to convert the counter to a string. A better implementation would use a multiplicative inverse, like compilers generate for small constant divisors (implemented in this case with LEA).

My first version of this almost worked the first time (after fixing a couple syntax errors so it would assemble, and a silly crash that took a couple minutes to debug IIRC): It printed fizz\nbuzz\n in the fizzbuzz\n case, and it reversed the digits. I keep forgettingI keep forgetting that digit-strings need to be stored with the most-significant digit first, not like the bytes in a little-endian binary integer.

Using a 16bit counter is at least a waste of bytes (operand-size prefixes), and can cause slowdowns. Using mov reg,0 is bad. Put the the conditional branch at the bottom of the loop whenever possible.

There's no compiler, so if you want good code, you have to write it yourself. You can't just hope the compiler will do something good with i%3 in a loop (which it doesn't anyway, for most compilers).

  • Inline buzz_or_number, and maybe turn it into a loop (FIZZMOD-1 iterations)

  • Besides that, it could probably branch less, and other minor improvements. This is sort of a version 1.1: working, tested, with some comments and observations added while writing up this answer, but not actually improving the code much from what I initially decided was good enough to see if it worked.

  • Make it more flexible by writing a cleanup loop (or assembler macros) for the last LASTCOUNT % FIZZMOD lines, instead of assuming that it's 1 line. Cleanup code is the downside to unrolling.

  • I used a div by 10 to convert the counter to a string. A better implementation would use a multiplicative inverse, like compilers generate for small constant divisors (implemented in this case with LEA).

My first version of this almost worked the first time (after fixing a couple syntax errors so it would assemble, and a silly crash that took a couple minutes to debug IIRC): It printed fizz\nbuzz\n in the fizzbuzz\n case, and it reversed the digits. I keep forgetting that digit-strings need to be stored with the most-significant digit first, not like the bytes in a little-endian binary integer.

Using a 16bit counter is at least a waste of bytes (operand-size prefixes), and can cause slowdowns. Using mov reg,0 is bad. Put the the conditional branch at the bottom of the loop whenever possible.

There's no compiler, so if you want good code, you have to write it yourself. You can't just hope the compiler will do something good with i%3 in a loop (which it doesn't anyway, for most compilers).

  • Inline buzz_or_number, and maybe turn it into a loop (FIZZMOD-1 iterations)

  • Besides that, it could probably branch less, and other minor improvements. This is sort of a version 1.1: working, tested, with some comments and observations added while writing up this answer, but not actually improving the code much from what I initially decided was good enough to see if it worked.

  • Make it more flexible by writing a cleanup loop (or assembler macros) for the last LASTCOUNT % FIZZMOD lines, instead of assuming that it's 1 line. Cleanup code is the downside to unrolling.

  • I used a div by 10 to convert the counter to a string. A better implementation would use a multiplicative inverse, like compilers generate for small constant divisors (implemented in this case with LEA).

My first version of this almost worked the first time (after fixing a couple syntax errors so it would assemble, and a silly crash that took a couple minutes to debug IIRC): It printed fizz\nbuzz\n in the fizzbuzz\n case, and it reversed the digits. I keep forgetting that digit-strings need to be stored with the most-significant digit first, not like the bytes in a little-endian binary integer.

replaced http://codegolf.stackexchange.com/ with https://codegolf.stackexchange.com/
Source Link
URL Rewriter Bot
URL Rewriter Bot

You can avoid div without unrolling by using down-counters to detect count%5==0 and count%3==0. @anatolyg's 16bit DOS code-golf FizzBuzz does that@anatolyg's 16bit DOS code-golf FizzBuzz does that. This is a really common technique to do something every N times something else happens. e.g. performance counter events work this way.

Correctness checkCorrectness check:

Anatolyg's golfed FizzBuzzAnatolyg's golfed FizzBuzz uses AAM in a loop with shr ax, 8 to generate one digit at a time in reverse order, storing and decrementing a pointer.

You can avoid div without unrolling by using down-counters to detect count%5==0 and count%3==0. @anatolyg's 16bit DOS code-golf FizzBuzz does that. This is a really common technique to do something every N times something else happens. e.g. performance counter events work this way.

Correctness check:

Anatolyg's golfed FizzBuzz uses AAM in a loop with shr ax, 8 to generate one digit at a time in reverse order, storing and decrementing a pointer.

You can avoid div without unrolling by using down-counters to detect count%5==0 and count%3==0. @anatolyg's 16bit DOS code-golf FizzBuzz does that. This is a really common technique to do something every N times something else happens. e.g. performance counter events work this way.

Correctness check:

Anatolyg's golfed FizzBuzz uses AAM in a loop with shr ax, 8 to generate one digit at a time in reverse order, storing and decrementing a pointer.

highlight sections / keywords for skimmability
Source Link
Peter Cordes
  • 376.9k
  • 50
  • 742
  • 1k

Here's my attempt at an efficient FizzBuzz (for AMD64 Linux), using no libraries. only write(2)write(2) and exit_group(2)exit_group(2)

./fizzbuzz | diff - <(perl -E'say((fizz)[$_%3].(buzz)[$_%5]or$_)for+1..100') # no output = no diffdifference 

Further###Further improvements, left as an exercise for the reader:

  • Inline buzz_or_number, and maybe turn it into a loop (FIZZMOD-1 iterations)

  • Besides that, it could probably branch less, and other minor improvements. This is sort of a version 1.1: working, tested, with some comments and observations added while writing up this answer, but not actually improving the code much from what I initially decided was good enough to see if it worked.

  • Make it more flexible by writing a cleanup loop (or assembler macros) for the last LASTCOUNT % FIZZMOD lines, instead of assuming that it's 1 line. Cleanup code is the downside to unrolling.

  • I used a div by 10 to convert the counter to a string. A better implementation would use a multiplicative inverse, like compilers generate for small constant divisors (implemented in this case with LEA).

Alternate ways to do things

I decided not to use a branchless version of the 1 digit vs. 2 digit ASCII conversion codebranchless version of the 1 digit vs. 2 digit ASCII conversion code, since it took a lot of instructions. Also, the branch should predict very well.

 ;; Untested buzz_or_number: ... .no_buzz:  ... div r10b DECIMAL_TO_ASCII_NEWLINE_2DIGIT equ 0x0a3030 ; add '0' to two unpacked decimal digits, and a newline DECIMAL_TO_ASCII_NEWLINE_1DIGIT equ 0x000a30 ;; hoist this out of the loop: mov r15d, DECIMAL_TO_ASCII_NEWLINE_2DIGIT - DECIMAL_TO_ASCII_NEWLINE_1DIGIT xor ecx,ecx cmp ah, 1 ; set CF if ah=0 (1 digit number), otherwise clear it. This allows sbb for a conditional add, instead of setcc cmovae ecx, r15d ; 0 or the difference from 1digit to 2digit lea eax, [rax+rcx + DECIMAL_TO_ASCII_NEWLINE_1DIGIT] ; rax+=0x0a3030 or 0x000a30, without clobbering flags mov [rdx], eax sbb edx, -3 ; add 2 (-(-3) - 1) or 3. ret 

In 32bit (and 16bit) mode, there's a div instruction that takes an immediate operanda div instruction that takes an immediate operand, and uses AL as the dividend, not AX. It's called AAMcalled AAM, and was removed for AMD64 along with the other BCD/ASCII instructions. It's convenient for testing divisibility by 5 without tying up a register for the divisor or wasting an instruction inside the loop. It's slightly faster than div r/m8, and sets flags according to the remainder (in al: it has its outputs reversed, compared to div).

This version is way more complicated because it's using AAM to check for count%5 and then processing that into count%10, instead of doing a separate divisionusing AAM to check for count%5 and then processing that into count%10, instead of doing a separate division to get ASCII digits.

Here's my attempt at an efficient FizzBuzz (for AMD64 Linux), using no libraries. only write(2) and exit_group(2)

./fizzbuzz | diff - <(perl -E'say((fizz)[$_%3].(buzz)[$_%5]or$_)for+1..100') # no output = no diff 

Further improvements, left as an exercise for the reader:

  • Inline buzz_or_number, and turn it into a loop (FIZZMOD-1 iterations)

  • Besides that, it could probably branch less, and other minor improvements. This is sort of a version 1.1: working, tested, with some comments and observations added while writing up this answer, but not actually improving the code much from what I initially decided was good enough to see if it worked.

  • Make it more flexible by writing a cleanup loop (or assembler macros) for the last LASTCOUNT % FIZZMOD lines, instead of assuming that it's 1 line. Cleanup code is the downside to unrolling.

  • I used a div by 10 to convert the counter to a string. A better implementation would use a multiplicative inverse, like compilers generate for small constant divisors (implemented in this case with LEA).

I decided not to use a branchless version of the 1 digit vs. 2 digit ASCII conversion code, since it took a lot of instructions. Also, the branch should predict very well.

 ;; Untested ... div r10b DECIMAL_TO_ASCII_NEWLINE_2DIGIT equ 0x0a3030 ; add '0' to two unpacked decimal digits, and a newline DECIMAL_TO_ASCII_NEWLINE_1DIGIT equ 0x000a30 ;; hoist this out of the loop: mov r15d, DECIMAL_TO_ASCII_NEWLINE_2DIGIT - DECIMAL_TO_ASCII_NEWLINE_1DIGIT xor ecx,ecx cmp ah, 1 ; set CF if ah=0 (1 digit number), otherwise clear it. This allows sbb for a conditional add, instead of setcc cmovae ecx, r15d ; 0 or the difference from 1digit to 2digit lea eax, [rax+rcx + DECIMAL_TO_ASCII_NEWLINE_1DIGIT] ; rax+=0x0a3030 or 0x000a30, without clobbering flags mov [rdx], eax sbb edx, -3 ; add 2 (-(-3) - 1) or 3. ret 

In 32bit (and 16bit) mode, there's a div instruction that takes an immediate operand, and uses AL as the dividend, not AX. It's called AAM, and was removed for AMD64 along with the other BCD/ASCII instructions. It's convenient for testing divisibility by 5 without tying up a register for the divisor or wasting an instruction inside the loop. It's slightly faster than div r/m8, and sets flags according to the remainder (in al: it has its outputs reversed, compared to div).

This version is way more complicated because it's using AAM to check for count%5 and then processing that into count%10, instead of doing a separate division to get ASCII digits.

Here's my attempt at an efficient FizzBuzz (for AMD64 Linux), using no libraries. only write(2) and exit_group(2)

./fizzbuzz | diff - <(perl -E'say((fizz)[$_%3].(buzz)[$_%5]or$_)for+1..100') # no output = no difference 

###Further improvements, left as an exercise for the reader:

  • Inline buzz_or_number, and maybe turn it into a loop (FIZZMOD-1 iterations)

  • Besides that, it could probably branch less, and other minor improvements. This is sort of a version 1.1: working, tested, with some comments and observations added while writing up this answer, but not actually improving the code much from what I initially decided was good enough to see if it worked.

  • Make it more flexible by writing a cleanup loop (or assembler macros) for the last LASTCOUNT % FIZZMOD lines, instead of assuming that it's 1 line. Cleanup code is the downside to unrolling.

  • I used a div by 10 to convert the counter to a string. A better implementation would use a multiplicative inverse, like compilers generate for small constant divisors (implemented in this case with LEA).

Alternate ways to do things

I decided not to use a branchless version of the 1 digit vs. 2 digit ASCII conversion code, since it took a lot of instructions. Also, the branch should predict very well.

 ;; Untested buzz_or_number: ... .no_buzz:  ... div r10b DECIMAL_TO_ASCII_NEWLINE_2DIGIT equ 0x0a3030 ; add '0' to two unpacked decimal digits, and a newline DECIMAL_TO_ASCII_NEWLINE_1DIGIT equ 0x000a30 ;; hoist this out of the loop: mov r15d, DECIMAL_TO_ASCII_NEWLINE_2DIGIT - DECIMAL_TO_ASCII_NEWLINE_1DIGIT xor ecx,ecx cmp ah, 1 ; set CF if ah=0 (1 digit number), otherwise clear it. This allows sbb for a conditional add, instead of setcc cmovae ecx, r15d ; 0 or the difference from 1digit to 2digit lea eax, [rax+rcx + DECIMAL_TO_ASCII_NEWLINE_1DIGIT] ; rax+=0x0a3030 or 0x000a30, without clobbering flags mov [rdx], eax sbb edx, -3 ; add 2 (-(-3) - 1) or 3. ret 

In 32bit (and 16bit) mode, there's a div instruction that takes an immediate operand, and uses AL as the dividend, not AX. It's called AAM, and was removed for AMD64 along with the other BCD/ASCII instructions. It's convenient for testing divisibility by 5 without tying up a register for the divisor or wasting an instruction inside the loop. It's slightly faster than div r/m8, and sets flags according to the remainder (in al: it has its outputs reversed, compared to div).

This version is way more complicated because it's using AAM to check for count%5 and then processing that into count%10, instead of doing a separate division to get ASCII digits.

added 311 characters in body
Source Link
Peter Cordes
  • 376.9k
  • 50
  • 742
  • 1k
Loading
added 16 characters in body
Source Link
Peter Cordes
  • 376.9k
  • 50
  • 742
  • 1k
Loading
Source Link
Peter Cordes
  • 376.9k
  • 50
  • 742
  • 1k
Loading