Reading articles about calculating square root of an integer I stumbled across the pseudocode attached below.
Original paper:
T-count and Qubit Optimized Quantum Circuit Design of the Non-Restoring Square Root Algorithm by Edgard Muñoz-Coreas and Himanshu Thapliyal.
ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 14, Issue 3. Article No.: 36, Pages 1 - 15.
https://arxiv.org/pdf/1712.08254 
It seemed interesting, so I started testing it for different values a, but in some cases my calculations were wrong. For example if a = 73
$a = 73_{10} = 0100 1001_2$
$n = 8$
$\text{Step 1: initialization of R and F (line 2 -> 4)}$
$R = 0000 0001$
$F = 0000 0001$
$R = R - F = 0000 0000$
$\text{Step 2: first iteration of the loop (line 15 -> 20)}$
$i = n/2 - 1 = 3$
$R \ge 0 \Rightarrow Y_{3} = 1$
$R = 00000000$ $\text{(shifting R by two bits and appending } a_5 \text{ and } a_4)$
$F = 0000 0101$ $\text{(000... + } Y_3 \text{ + 01)}$
$R = R - F = 0000 0000 - 0000 0101 = 1111 1011$
$\text{Step 3: second iteration of the loop (line 8 -> 13)}$
$ i = 2 $
$ R \le 0 \Rightarrow Y_2 = 0$
$ R = 11101110 $ $\text{(shifting R by two bits and appending } a_3 \text{ and } a_2)$
$ F = 0000 1011 $ $\text{(000... + } Y_3 + Y_2 \text{ + 11)}$
$ R = R + F = 1110 1110 + 0000 1011 = 1111 1001$
$\text{Step 4: third iteration of the loop (line 8 -> 13)}$
$ i = 1 $
$ R \le 0 \Rightarrow Y_1 = 0 $
$ R = 1110 0101 $ $\text{(shifting R by two bits and appending } a_1 \text{ and } a_0)$
$ F = 0001 0011 $ $\text{(000... + } Y_3 + Y_2 + Y_1 \text{ + 11)}$
$ R = R + F = 1110 0101 + 0001 0011 = 1111 1000 $
$\text{Step 5: final remainder restoration (line 25 -> 28)} $
$ R \le 0 \Rightarrow Y_0 = 0 $
$ F = 0100 0001 $ $(0 + Y_3 + Y_2 + Y_1 + Y_0 + 01)$
$ R = R + F = 1111 1000 + 0100 0001 = 0011 1001 $
The problem is in the final step, Y contains 1000 which is 8 which is correct, but R should contain 9 because 73 = 8 * 8 + 9, but the value is 57.
I noticed that if in the final step we shift F by one bit and append 1 instead of 01, the result will be correct, but the document says otherwise.
Another thing that I don't understand is shifting R in a loop. Pseudocode says we should add zeros at the beginning and then shift R but when I do that the result is messing up when R < 0, so instead I shift the whole R without adding leading zeros.
I hope somebody will be able to help me understand what I'm doing wrong and why the results of the algorithm are wrong.
Updated: For any given a algorithm works if
- shift
Rwithout adding leading zeros - in the last step when
R < 0shiftingFby 1 bit (instead of 2) and appending 1 (instead of 01)