1
$\begingroup$

I'm working on RS Enc&Dec, for decoder, I use Reformulated inversionless Berlekamp-Massey (RiBM) algorithm to obtain $\Lambda$ and $\Omega$.

However, the original RiBM needs $2t$ iterations (here, $n-k=2t$).

Fortunately, I heard that there is a method to reduce iteration number of RiBM only use $t$, but the details is not clear for me. Can anyone explain the details?

Updated: I notice this modified algorithm in a master's thesis from ZheJiang University, this algorithm is described as follows:

enter image description here

$\endgroup$
2
  • $\begingroup$ Can you provide some details about this claim? Who wrote this and where? Or who said this (in a conference presentation? Seminar? Job interview?) etc. I am one of the authors of the original RiBM paper, keep up with work on this topic, and haven’t heard anything about this matter. $\endgroup$ Commented Jun 17 at 11:52
  • $\begingroup$ @ Dilip Sarwate, I've added the modified version of RiBM, but I don't know whether this modification is right. $\endgroup$ Commented Jun 18 at 3:09

1 Answer 1

1
$\begingroup$

TL;DNR version What is written below is not an answer to the OP's request for an explanation of the algorithm that the OP has found in a master's thesis from a Chinese university. It does, however, show that while the number of iterations is reduced by a factor of $2$, the execution time of each iteration is almost doubled and the number of finite-field multipliers is more than tripled.

I have not studied the algorithm given in the OP's question in detail, and so do not make any statement regarding whether it is correct. Nor can I provide an explanation of the algorithm. I do, however, wish to make the following observations.

  • The (inversionless) Berlekamp-Massey (iBM) algorithm as implemented in VLSI by many people is an iterative algorithm with $2t$ iterations. Each iteration consists of two Steps (i) compute the discrepancy (a sum pf products), and (ii) use the discrepancy to update the polynomials. Thus, the two Steps must be performed sequentially -- Step (i) and then Step (ii). With parallel computation in VLSI, Step (i) has execution time of one finite-field multiplication ($\operatorname{FFM}$) plus $\lceil\log_2(t+1)\rceil$ additions ($\operatorname{XOR}$ for fields of characteristic 2) while Step (ii) has execution time $\operatorname{FFM} + \operatorname{XOR}$ for a total execution time of $2\operatorname{FFM} + (1+\lceil\log_2(t+1)\rceil)\operatorname{XOR}$ per iteration and $4t\operatorname{FFM} + 2t(1+\lceil\log_2(t+1)\rceil)\operatorname{XOR}$ for the entire KES process. The storage requirements are the syndrome register of $2t$ symbols that remains unchanged during the entire KES process, $2$ registers of $t+1$ symbols to store the error-locator polynomial and its auxiliary polynomial (maximum degree $t$), and $2$ registers of $t$ symbols to store the error-evaluator polynomial and its auxiliary polynomial (maximum degree $t-1$).
  • The key idea of the reformulated iBM algorithm devised by Naresh Shanbhag and myself in [1] uses the fact that the first of the $2t$ iterations of the Berlekamp-Mssey algorithm does not require any discrepancy computation at all: the discrepancy is just the first entry of the syndrome register. We devised a scheme to modify the syndrome register so that the first entry of the syndrome register would contain the discrepancy in succeeding iterations as well (the fixed location is very helpful in VLSI implementation) and then realized that doing so meant that
    (a) Step 1(i) is not needed,
    (b) Step 1(ii) and Step 2(i) can be executed simultaneously instead of sequentially,
    (c) More generally, Step $m$(ii) and Step $(m+1)$(i) can be executed simultaneously instead of sequentially.
    Thus, the KES process using the reformulated iBM algorithm has execution time of $2t(\operatorname{FFM} + \operatorname{XOR})$ which is less than half the execution time of the KES process using the iBM algorithm.
  • The RiBM architecture implements the reformulated iBM algorithm in a single systolic array of $3t+1$ identical cells. Each cell has two finite-field multipliers, one finite-field adder and one multiplexer. The array executes $2t$ iterations with total execution time $2t(\operatorname{FFM} + \operatorname{XOR})$. At the end of the KES process, the error-locator polynomial is found at the high end of the array and the error-evaluator polynomial is at the low end of the array.

With all that as preliminaries, let us examine the algorithm stated in the OP's question and consider its implementation in an array of $3t+1$ cells that iterate only $t$ times instead of the $2t$ iterations required in by the RiBM architecture. The computation in each iteration seems to have an execution time of $3\operatorname{FFM} + 2\operatorname{XOR}$ for Step 1 and an additional multiplication $\big(\theta_i(r+1)=\gamma(r)\cdot \delta_i(r+2)\big)$ in Step 2 which gives a total execution time of $t(4\operatorname{FFM} + 2\operatorname{XOR})$ which is about two times larger than the $t(2\operatorname{FFM} + 2\operatorname{XOR})$ achieved by the RiBM architecture in [1]. Each cell has 7 finite-field multipliers, 3 finite-field adders, and many multiplexers (I didn't bother to count how many). There also are issues with regard to wiring delays. In the RiBM architecture, all connections between the cells are of the array are local, each cell #i connects only to the neighboring cells #$(i+1)$ and #$(i-1)$ whereas in the architecture proposed, cell #$i$ also connects to cell #$(i+2)$.

For these reasons, I think that the master's thesis found by the OP is not as significant an advance in the state of the art as the OP believes it to be.

[1] Dilip V. Sarwate and Naresh R. Shanbhag, "High-Speed Architectures for Reed-Solomon Decoders," IEEE Trans. VLSI Systems, vol. 9, pp. 641-655, October 2001.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.