13
\$\begingroup\$

If a hardware doesn't support modulus or division operations, it takes many more CPU cycles to simulate modulus/division by software. Is there any faster way to calculate division and modulus if the operand is 10?

In my project I frequently need to calculate integer modulus 10. In particular, I'm working on PIC16F and need to show a number on an LCD. There are 4 digits to support, so there are 4 calls to modulus and division function (software implementation). That is, like the following:

digit = number % 10; // call to an expensive function number /= 10; // call to an expensive function somehow_lit_segments(); digit = number % 10; // call to an expensive function number /= 10; // call to an expensive function somehow_lit_segments(); digit = number % 10; // call to an expensive function number /= 10; // call to an expensive function somehow_lit_segments(); digit = number % 10; // call to an expensive function number /= 10; // call to an expensive function somehow_lit_segments(); 

There are other areas that uses similar code.

\$\endgroup\$
7
  • \$\begingroup\$ Why are a few dozen calls/sec a problem? I wouldn't bother unless the project is fully functional and bug-free. \$\endgroup\$ Commented Apr 5, 2011 at 21:53
  • \$\begingroup\$ I've noticed that if I continuously displaying some number in the main busy loop, the button response becomes slow. I.e., to detect that a button has been pressed, I have to press that button a bit longer. This happens when system clock is running 32768 Hz. \$\endgroup\$ Commented Apr 5, 2011 at 23:10
  • \$\begingroup\$ Are you using interrupts? Why are you using a 32kHz xtal; usually you can get lower power performance if you operate faster and go to sleep when idle. \$\endgroup\$ Commented Apr 5, 2011 at 23:13
  • \$\begingroup\$ i'm using interrupts. but just to update display it doesn't worth to switch to high speed oscillation. power-wise. for my project. it has to be run low speed clock nearly 90% of its life time. \$\endgroup\$ Commented Apr 5, 2011 at 23:20
  • 2
    \$\begingroup\$ As general note, the book Hacker's Delight by Henry S. Warren, Jr. is the source for clever bit-twiddling trickery. I looked for division suggestions, and it doesn't have anything for divide by 10 that is superior to any of the answers below. \$\endgroup\$ Commented Apr 6, 2011 at 1:39

10 Answers 10

12
\$\begingroup\$

Heres a binary to BCD algorithm I used several years ago based on one found here. I was using an external BCD to 7 seg display driver so the result could be written to the proper ports directly as packed BCD for output.

This is fairly fast if you have a hardware multiplier in the PIC, I was using a PIC18F97J60. If you don't have a hardware multiplier on your PIC, consider using shift + add for the multiplication.

This takes in an unsigned 16-bit int and returns packed BCD with 5 digits, it could be modified and made faster for 4 digits. It uses shift + additions to approximate division by 10 but given the limited input range it is exact for this use. You may want to pack the result differently as well to line up with how your using the result.

void intToPackedBCD( uint16_t n, uint8_t *digits ) { uint8_t d4, d3, d2, d1, d0, q; //d4 MSD, d0 LSD d1 = (n>>4) & 0xF; d2 = (n>>8) & 0xF; d3 = (n>>12) & 0xF; d0 = 6*(d3 + d2 + d1) + (n & 0xF); q = (d0 * 0xCD) >> 11; d0 = d0 - 10*q; d1 = q + 9*d3 + 5*d2 + d1; q = (d1 * 0xCD) >> 11; d1 = d1 - 10*q; d2 = q + 2*d2; q = (d2 * 0x1A) >> 8; d2 = d2 - 10*q; d3 = q + 4*d3; d4 = (d3 * 0x1A) >> 8; d3 = d3 - 10*d4; digits[0] = (d4<<4) | (d3); digits[1] = (d2<<4) | (d1); digits[2] = (d0<<4); } 
\$\endgroup\$
1
  • \$\begingroup\$ great link, thanks! it doesn't only optimizes the speed, it also decreases the code size. I've implemented "12 bit binary to 4 ASCII Decimal Digits" from your link because that doesn't involve any multiplication. \$\endgroup\$ Commented Apr 6, 2011 at 18:13
8
\$\begingroup\$

Assuming unsigned integers, division and multiplication can be formed from bit shifts. And from (integer) division and multiplication, modulo can be derived.

To multiply by 10:

y = (x << 3) + (x << 1); 

To divide by 10 is more difficult. I know of several division algorithms. If I recall correctly, there is a way to divide by 10 quickly using bit shifts and subtraction, but I can't remember the exact method. If that's not true, then this is a divide algorithm which manages <130 cycles. I'm not sure what micro you're using, but you can use that in some way, even if you have to port it.

EDIT: Somebody says over at Stack Overflow, if you can tolerate a bit of error and have a large temporary register, this will work:

temp = (ms * 205) >> 11; // 205/2048 is nearly the same as /10 

Assuming you have division and multiplication, modulo is simple:

mod = x - ((x / z) * z) 
\$\endgroup\$
8
\$\begingroup\$

You can convert from binary to packed BCD without any division using double dabble algorithm. It uses only shift and add 3.

For example convert 24310 = 111100112 to binary

0000 0000 0000 11110011 Initialization 0000 0000 0001 11100110 Shift 0000 0000 0011 11001100 Shift 0000 0000 0111 10011000 Shift 0000 0000 1010 10011000 Add 3 to ONES, since it was 7 0000 0001 0101 00110000 Shift 0000 0001 1000 00110000 Add 3 to ONES, since it was 5 0000 0011 0000 01100000 Shift 0000 0110 0000 11000000 Shift 0000 1001 0000 11000000 Add 3 to TENS, since it was 6 0001 0010 0001 10000000 Shift 0010 0100 0011 00000000 Shift 2 4 3 BCD 

This algorithm is very efficient when there's no hardware divisor available. More over only left shift by 1 is used, so it's fast even when a barrel shifter is not available

\$\endgroup\$
4
\$\begingroup\$

Depending on the amount of digits you need you may be able to use the brute force method (d - input number, t - output ASCII string):

t--; if (d >= 1000) t++; *t = '0'; while (d >= 1000) { d -= 1000; *t += 1; } if (d >= 100) t++; *t = '0'; while (d >= 100) { d -= 100; *t += 1;} if (d >= 10) t++; *t = '0'; while (d >= 10) { d -= 10; *t += 1;} t++; *t = '0' + d; 

You can also change the multiple ifs into a loop, with powers of ten obtained by multiplication or a lookup table.

\$\endgroup\$
2
\$\begingroup\$

Application note AVR204 describes algorithms for BCD arithmetic, including conversion from binary to BCD and vice versa. The appnote is by Atmel, which is AVR, but the described algorithms are processor-independent.

\$\endgroup\$
1
  • \$\begingroup\$ Assembly: AVR204.zip. \$\endgroup\$ Commented Mar 30, 2023 at 16:27
1
\$\begingroup\$

I don't have a good answer, but there's a great discussion on our sister site Stack Overflow on the exact same subject of division and modulo optimization.

Do you have enough memory to implement a lookup table?

Hackers Delight has a paper on optimal division algorithms.

\$\endgroup\$
1
  • \$\begingroup\$ no, don't have enough memory. I want to do that using addition, subtraction and bit shift. \$\endgroup\$ Commented Apr 5, 2011 at 23:15
1
\$\begingroup\$

Have you considered holding that value as BCD all the time (using simple special "BCD increment" and "BCD add" subroutines), rather than holding that value in binary form and converting to BCD as needed (using a more difficult to understand "convert from binary to BCD" subroutine)?

At one time, all computers stored all data as decimal digits (ten-position gears, two-out-of-five code vacuum tubes, BCD, etc.), and that legacy still lingers on today. (see Why do real-time clock chips use BCD ).

\$\endgroup\$
1
  • \$\begingroup\$ The number to be displayed on the LCD is a variable, ranging from -1999 to 1999. It indicates a temperature and is calculated in binary format. \$\endgroup\$ Commented Apr 6, 2011 at 14:06
1
\$\begingroup\$

The PICList is an amazing resource for people programming PIC processors.

BCD conversion

Have you considered using an off-the-shelf tried-and-tested binary-to-BCD subroutine specifically optimized for the PIC16F?

In particular, people on the PICList have spent a lot of time optimizing binary-to-BCD conversions on a PIC16F. Those routines (each one hand-optimized for a specific size) are summarized at "PIC Microcontoller Radix Conversion Math Methods" http://www.piclist.com/techref/microchip/math/radix/index.htm

integer division and mod

On a CPU like the PIC16F, a subroutine specialized to divide by a constant is often much faster than a general-purpose "divide variable A by variable B" routine. You may want to put your constant (in this case, "0.1") in the "Code Generation for Constant Multiplication/Division" http://www.piclist.com/techref/piclist/codegen/constdivmul.htm or check out the canned routines near http://www.piclist.com/techref/microchip/math/basic.htm .

\$\endgroup\$
0
1
\$\begingroup\$

Given an 8x8 hardware multiply, one can compute a divmod-10 of an arbitrary-size number by using a routine which computes it for 12-bit number in the range 0-2559 via the procedure:

  1. Assume original number in OrigH:OrigL
  2. Divide the original number by two and store that in TempH:TempL
  3. Add the MSB of TempL*51 to the LSB of TempH*51. That is the approximate quotient
  4. Multiply the approximate quotient by 10, discarding the MSB of the value.
  5. Subtract the LSB of that result from the LSB of the original number.
  6. If that value is 10 or greater (max will be 19), subtract 10 and add 1 to the approximate quotient

I would suggest writing a divmod routine which the MSB of the number will be in W, and the LSB pointed to by FSR; the routine should store the quotient in FSR with post-decrement and leave the remainder in W. To divide a 32-bit long by 10, one would then use something like:

 movlw 0 lfsr 0,_number + 3 ; Point to MSB call _divmod10_step call _divmod10_step call _divmod10_step call _divmod10_step 

A divmod-6 step would be very similar, except using constants of 85 and 6 rather than 51 and 10. In either case, I would expect the divmod10_step would be 20 cycles (plus four for the call/return) so a short divmod10 would be about 50 cycles and a long divmod10 would be about 100 (if one special-cases the first step, one could save a few cycles).

\$\endgroup\$
1
\$\begingroup\$

this may not be the fastest but is a simple way.

 a = 65535; l = 0; m = 0; n = 0; o = 0; p = 0; while (a >= 10000) { a -= 10000; l += 1; } while (a >= 1000) { a -= 1000; m += 1; } while (a >= 100) { a -= 100; n += 1; } while (a >= 10) { a -= 10; o += 1; } while (a > 0) { a -= 1; p += 1; } 
\$\endgroup\$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.