International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 DOI: 10.5121/ijcnc.2016.8309 123 PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM Nizar OUNI1 and Ridha BOUALLEGUE2 1 National Engineering School of Tunis, SUP’COM, InnovCom laboratory, Tunisia 2 SUP’COM, InnovCom laboratory, Tunisia ABSTRACT Multiple-input multiple-output (MIMO) systems are playing an increasing and interesting role in the recent wireless communication. The complexity and the performance of the systems are driving the different studies and researches. Lattices Reduction techniques bring more resources to investigate the complexity and performances of such systems. In this paper, we look to modify a fixed complexity verity of the LLL algorithm to reduce the computation operations by reducing the number of iterations without important performance degradation. Our proposal shows that we can achieve a good performance results while avoiding extra iteration that doesn’t bring much performance. KEYWORDS MIMO systems, LR-aided, Lattice, LLL, BER, Complexity. 1. INTRODUCTION MIMO communication systems are introduced to combat fading and provide high data rate. The MIMO system consists of transmitting multiple independent data symbols via multiple antennas. For the reception, a MIMO decoder needs to be used to detect, estimate, and reconstruct the received symbols. Multiple detection schemes can be used, such as the zero-forcing (ZF) or the minimum mean square error (MMSE) criterion. Also, the maximum likelihood decoder (ML) is considered as the optimal solution for the MIMO detection in term of Bit Error Rate (BER). But, unfortunately the ML algorithm seems to be complex for hardware implementations. Therefore, linear MIMO detection techniques like ZF and MMSE are better in term of complexity, but suffer from BER performance degradation. The lattice-reduction (LR) preprocessing technique has been proposed to be used with linear detection in order to transform the system model into an equivalent system with better channel matrix’s effect and so to reduce the complexity of the system. It was shown in previous studies that LR techniques improve the BER performances significantly. The populated LR algorithm is called Lenstra-Lenstra-Lovàsz (LLL) algorithm is the most used one. It was called according to the name of the inventors [1]. But, the LLL algorithm brings many challenges due to higher processing complexity and the undeterministic execution time [2]. LLL algorithm has a major limit which is the varying complexity that could be large and limits the decoding speed of the communication system. But, it is always presenting the best performance in term BER. The complex Lenstra-Lenstra-Lovàsz algorithm (CLLL) [3]is applying
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 124 the basis reduction for complex field, while the LLL is targetinga real valued matrix. The different studies and simulation results show that CLLL requires less processing operations [4]. Effective LLL algorithm (ELLL) [5], come with a new idea that consists to change the Lovàsz reduction condition in order to relax the related equations. Also, the FcLLL prposed by Wen [2] reduces the number of iterations for the algorithm to fix iteration number instead of infinite iterations. This technique improves the complexity but remains worse than LLL in term of BER performance. In this paper we, will focus on the FcLLL algorithm using ZF decoding technique and we propose some modifications to the original FcLLL algorithm to keep a reduced number of loops and targeting a good BER. 2. SYSTEM MODEL DESCRIPTION During this paper we will consider that (. ) and (. ) denote respectively the hermission transpose and the transpose of a matrix. We consider the spatial multiplexing MIMO system with transmit and receive antennas with a Rayleigh channel non variant in the time. = . + (1) Where = [ , , … , ] ; (s	∊ s) is the information vector with being a constellation set of square QAM with [ ] = . and the real and imaginary parts are {−#$% + 1, … , −1, 1, … , #$% − 1} with M) being the constellation size, . We will suppose that the average transmit power of each antenna is normalized to one, so [ ] = . With I+ is the m × m identity matrix. is an	× ; (N/ ≥ N1)complex channel matrix, x= [ , , … , x] is the received signal vector, and = [ , , … , 3 ]4 is the complex additive white Gaussian noise (AWGN) vector with zero mean and covariance 5. 3 . On the receiver side, = [ , , … , 3 ]4 are the symbols at receiver’s respective antennas which will be used to estimate transmitted e the symbols [4]. The receiver will analyze all received information to compute the transmitted data. So, a detection, computation, equalization and estimation of the received data will happen. At receiver side, the linear zero forcing (ZF) detector compute the inverse of the channel matrix to estimate the transmitted symbols which can be expressed by, s67 = ( . )8 .9:::;:::< =>>/?8@?A/>B?	CB?DE>8 AF?/B? . x (2) The channel matrix is QR decomposed into two parts as =	GH.
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 125 Figure 1. MIMO system with Transmitter and Receiver antennas. 3. LATTICE REDUCTION TECHNIQUE We can interpret the columns ℎJof the channel matrix as the basis of a lattice and assume that the possible transmit vectors are given by ℤ+ , the m dimensional infinite integer space. Consequently, the set of all possible undisturbed received signals is given by the lattice. L( ) =	L(ℎ , … , ℎM): =	{∑ ℎJ M JP |ℎJ ∈ ℤ} (3) The LR algorithm generates a lattices reduced and near-orthogonal channel matrixS	= . T. With matrix S	= . T generates the same lattice as , if and only if the m × m matrix T is unimodular [6], i.e. T contains only integer entries and VWX(T)	=	±1: L( S	)	=	L( )	⇔ S	= T[ VT ] ^_V]`[a (4) Also, S. T8	= (5) We can find multiple bases that can be included in the space L, and the goal of the LR algorithm is to find a set of least correlated base with the shortest basis vectors [2].Initially, an efficient (but supposed not optimal) way to determine a reduced basis was proposed by Lenstra, Lenstra and Lovàsz [1].Where they defined (LLL-Reduced): A basis S with QR decomposition S = Gb. Hb is called LLL-reduced with parameter δ	with	(1/4 <	k ≤	1), if mHbn,om ≤ . mHbn,nm	p_a	1 ≤ < q ≤ ^ (6) And kmHbo8 ,o8 m ≤ mHbo,om + mHbo8 ,om	p_a	q = 2, … , ^ (7) The first condition is called, size-reduced and the second one is called Lovàsz condition. The parameter δ plays an important role to the quality of the reduced basis. We will assume δ = 3 4⁄
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 126 as proposed in [1]. After applying the QR decomposition of H, doing successive size-reduces if the condition is fulfilled, the algorithm exchanges two vectors if Lovàsz condition is not fulfilled to generateT, compute RS andQS. And so, the LLL algorithm will output QS, RS andT. Looking to the LLL algorithm [1], one important element of its complexity is related to the fact that the LLL algorithm is applied for the real integer vectors, it is mandatory to reformulate the different matrices to their real-valued form, so we got: H/?xy = z Real(H) −Im(H) Im(H) real(H) • (8) x = z Real(x) Im(x) • (9) s = z Real(s) Im(s) •	and	n = z Real(n) Im(n) • (10) This kind of reformulation increases the number of operations and adds more latency for the system. The idea behind LR-aided linear detection is to consider the equivalent system model and perform the nonlinear quantisation on it [7]. In fact, if we combine equations (1) and (5), we can get: = S. T8 .9;< ‚ + (11) With ƒ = T8 . the equivalent model and in this case S will represent a better channel quality. And so, the detector can be represented with an equivalent model with better performance due to the less noise enhancement increased by S. Thus, the basic idea behind approximate lattice decoding (LD) is to use LR in conjunction with traditional low-complexity decoders. With LR, the basis B is transformed into a new basis consisting of roughly orthogonal vectors [8]. After processing the Zero Forcing lattice reduction (ZF-LR) mechanism and by combining equations (2) and (11), we can generate: z…678†‡ = T8 . s…67 = S. = ƒ + S. (12) The complex form of this algorithm was presented by Gan and Mow in [3]. But we can clearly identify that this extension keeps the excessive number of iteration and also add more computation latency by introducing the real and imaginary elements in the different conditions of the algorithm. For this reasons, Vetter proposed another variety of the complex LLL, than Ling [7] proposed a fixed complexity LLL (FCLLL). As recapitulation the modifications for the LLL algorithm was for three points: • Avoid the complex to real vector transformations (reduce the number of loops). • The reduction of the number of the LLL iteration to a fixed number. • The use of a flag to track column exchanges. When no column swap happens, the FCLLL ends with an LLL reduced basis. The different enhancements for the original algorithm where looking for limited iterations in term of stopping criteria, like in [2] and [5].
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 127 In next section we will consider the proposal in [2] and start form the Wen’s algorithm as described in table 1 where, Wen proposed an enhanced form of Vetter’s algorithm. The proposal is based on an improved column traverse strategy and an enhanced termination criterion for practical LR-aided SIC MIMO detection. Table 1. The Fixed Complex LLL algorithm [2] Input: H: the channel complex matrix Output: Hb, Gb, T 1 Initialization: T = ; ;	‰ Š`[‹ = _ W (1, + 1); Œ W•; Ž••; 2 [G, H] ∶= •a( ); 3 k = 3 4⁄ 4 n ’ = 1 5 Œ W•n“•=1 6 ”ℎ`W	( n ’ ≤ Ž••)&&	( ]^(‰ p`[‹(2: 1: )) ≠ 0 7 Œ = Œ W•(Œ W•n“•) ; 8 ‰ p`[‹(Œ) = 0; 9 p_a	Œ = 2: 10 ˜ ∶= (Hb(`, Œ) Hb(`, Œ)⁄ ) 11 p	˜ ≠	0 12 Hb(1: `, Œ) ∶= Hb(1: `, Œ) − ˜	. Hb(1:`, `) 13 T(: , Œ) ∶= T(: , Œ) − ˜	. T(: , `) 14 W V 15 W V 16 p	k. Hb(Œ − 1, Œ − 1) > Hb(Œ, Œ) + Hb(Œ − 1, Œ) 17	Œ	X_	Œ − 1	š_`]^ ”[›	p_a	Hb	[ V	T 18 ‰_^›]X ‹	XℎW	œ	$[Xa :	œ = z •ž Ÿ̅ −Ÿ • •	”Xℎ • = Hb(Œ − 1, Œ − 1) ¡Hb(Œ − 1: Œ, Œ − 1)¡ Ÿ = Hb(Œ, Œ − 1) ¡Hb(Œ − 1: Œ, Œ − 1)¡ 19 Hb(Œ − 1: Œ, Œ − 1: ^) ∶= œ. Hb(Œ − 1: Œ, Œ − 1: ^) 20 Gb(: , Œ − 1: Œ) ∶= Gb(: , Œ − 1: Œ). œ4 21 ‰ p`[‹(Œ − 1: 1: Œ + 1) = 1; 22	W V 23 W V 24 n ’ = n ’ + 1 25 Œ W•n“•=Œ W•n“• +1 26 W V
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 128 4.MODIFIED ALGORITHM AND STUDY OF THE EFFECT OF MAX ITERATION ON THE FCLL ALGORITHM In this section we will start from the FCLLL algorithm proposed by Wen [2] and we will try to do modify it. In fact, for line 21 there is the table “ ‰ p`[‹” which is a condition for the loop as mentioned in line 6. The summation of the elements of this “table” seems to add − 1	more addition operations that need to be computed for each loop. So, for us, it will be better to come back to the single element condition as mentioned in [9]. A second remark, the Lovàsz condition such as described in line 16, is representing four complex multiplication, one addition operation and one subtraction operation (which can be considered as addition operation). All of them are complex and being running in loop. It will be better to use the Siegel condition which is always fulfilling the Lovàsz condition and we can go more to show that it reduce the computing operations [10] & [11]. The representation is below: mHbo8 ,o8 m ≤ ¢mHbo,om	(13) Where ζ is chosen from[2, 4], To have the Siegel condition fulfilled, we have to check if: ¤ . mHbo8 ,o8 m > mHbo,om	(14) Whatever the value of ζ, 2 or 4, we can get: k. mHbo8 ,o8 m > mHbo,om	(15) Since	δ > ¥ , So, we can modify the algorithm in table 1 to get the below one as described in table 2. Table 2: The proposed Modified Complex LLL algorithm Input: H: the channel complex matrix Outpu t: Hb, Gb, T 1 Initialization: T = ; = ƒW( , 2); ‰ Š`[‹ = 0; XWa$[ = 6; 2 [G, H] ∶= •a( ); 3 k = 3 4⁄ 4 XWa = 0 5 ”ℎ`W	(‰ p`[‹ == 0)	&&	( XWa ≤ XWa$[ ) 6 ‰ p`[‹ =1; 7 XWa = XWa + 1; 8 p_a	Œ = 2: 9 ˜ ∶= a_] V(Hb(`, Œ) Hb(`, Œ)⁄ ) 10 p	˜ ≥ 1 11 Hb(1: `, Œ) ∶= Hb(1: `, Œ) − ˜	. Hb(1: `, `)
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 129 12 T(: , Œ) ∶= T(: , Œ) − ˜	. T(: , `) 13 W V 14 W V 15 p	k. Hb(Œ − 1, Œ − 1) > Hb(Œ, Œ) 16	Œ	X_	Œ − 1	š_`]^ ”[›	p_a	Hb	[ V	T 17 ‰_^›]X ‹	XℎW	œ	$[Xa :	œ = z •ž Ÿ̅ −Ÿ • •	”Xℎ • = Hb(Œ − 1, Œ − 1) ¡Hb(Œ − 1: Œ, Œ − 1)¡ Ÿ = Hb(Œ, Œ − 1) ¡Hb(Œ − 1: Œ, Œ − 1)¡ 18 Hb(Œ − 1: Œ, Œ − 1: ^) ∶= œ. Hb(Œ − 1: Œ, Œ − 1: ^) 19 Gb(: , Œ − 1: Œ) ∶= Gb(: , Œ − 1: Œ). œ4 20 ‰ p`[‹ =0; 21	W V 22 Œ ∶= Œ + 1 23 W V 24 W V In the proposed algorithm we have modified the line 5 by avoiding “CSflag” table summation presented in table 1 and proposed in [2]. This will help to reduce additional processing operations which will help to “relax” the algorithm in term of complexity and decoding timing. In fact, and as described in the previous section, the contribution of the elementof this “table” in the algorithm doesn’t exceed the termination condition. The importance of this modification can be observed in the next sections, especially of the gain in terms of complexity. We can clearly observe that the max iteration number ( Ž••) is a condition to exit from the loop and so, it can increase the number of computation operations. This means, there is an ideal max iteration value that above it the system becomes exponential complex without large BER enhancement. Also, the modifications in line 10 and line 15 will help to reduce the processing operations because the algorithm will converge quicker than the initial version. In fact, the Siegel condition helps to relax the processing operations like presented in [10]. So, it’s interesting to evaluate the effect of the Max iteration on the BER performance and also the system complexity. For this, we tried to do the simulation of the algorithm with varying the value of the Max iteration. 5.SIMULATION RESULTS AND EFFECT OF THE MAX ITERATION ON THE FCLLL ALGORITHM For our simulation, we will consider the 16QAM constellation, ZF equalization will be checked. The MIMO model will be 4 × 4 ; means a 4 antennas at both transmitter and receiver side. We used a frame size of 10§ . We will indicate inline any changes to the above configuration. In the flowing figures, we tried to increase the max iteration number from 4 to 18.
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 130 Figure 2: 4 × 4 Modified Complex LLL with 16QAM and ZF Bit error rate results. 5, 8 and 10 IterMax results compared to ML and ordinary LLL. Observing figure 2, the ML curve is outperforming all different curves. But we should note that the ML scheme is extremely complex to implement. So, we are indicating it just for reference and comparison reasons.Another quick remark is that comparing the FCLLL curves and the ordinary LLL algorithm we can see that for	IterMax ≤ 5, the LLL is better comparing FCLLL. But for MaxIter equal to 5, the two curves are overlapping till SNR equal to 24dB and after the deviation is minimal. Which means that in terms of performances we are still in an acceptable range and so it will be interesting to push the analysis and also evaluate the gain in complexity and processing operations. In the figure 3, we increase the max iteration value from 4 to 18 to observe if any threshold value for this parameter; that allow to reach a better result with lower iterations. 0 5 10 15 20 25 30 35 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 SNR per receiver antenna [average in dB] BER ML LLL-ZF Modif fCLLL-ZF (5Iter) Modif fCLLL-ZF (8Iter) Modif fCLLL-ZF (10Iter) LLL beter than 5 Iters 8 and 10 Iter are same
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 131 Figure 3: BER results by varying the	IterMax: curves for IterMax equal to 4,5,6,7, 8,9 and 18. Looking to figure 3 we can observe that, staring from XWa$[ ≥ 6, the FCLLL become similar or better than the LLL. But, starting from 8 iteration we can observe that no improvement for the BER. Means, the curves remain overlapping each other. This leads us to conclude that no need to increase the XWa$[ parameter above 8 iterations. Else, the system became costly comparing to its performance. For XWa$[ between 5 and 8, we can push the analysis. In fact, the LLL algorithm as described in [1], will do a minimum of 2. loops; taking in consideration the fact that the size of the channel real-valued matrix H used for the LLL algorithm is double of the complex matrix H. Thus, for this case and with 8 IterMax we are exactly in the same condition as the LLL algorithm. From another point of view, a XWa$[ ≤ 4 will show a BER degradation. This is related to the fact that the algorithm will do a column swap for only half of the possible columns of the matrix. If we consider 5, 6 and 7 as IterMax we can see that we more or less close to the LLL algorithm, since the difference is observed only for the high SNR and the deviation is minimal. In the case of XWa$[ = 6 the BER curve is almost overlapping the ordinary LLL curve. From our point of view, using the XWa$[ equal to 6 seems to be the recommended value, since it has a good complexity to performance balance. Figures 4 and 5 show a zoom on the different curves to illustrate our analysis. 0 5 10 15 20 25 30 35 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 SNR per receiver antenna [average in dB] BER Modif fCLLL-ZF(4Iters) Modif fCLLL-ZF(5Iters) Modif fCLLL-ZF(6Iters) Modif fCLLL-ZF(7Iters) Modif fCLLL-ZF(8Iters) Modif fCLLL-ZF(9Iters) Modif LLL-ZF(18Iters) 8/9/18Iters curves are overlapping and 7Iters one is close to them (Red curve)
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 132 Figure 4: 4 × 4 Modified Complex LLL with 16QAM and ZF Bit error rate results. 5, 8 and 10 IterMax Zoomed curves compared to ML and ordinary LLL. Figure 5: BER results by varying the	IterMax: Zoomed curves for IterMax equal to 4,5,6,7, 8,9 and 18. We remark that starting from XWa$[ = 8 the system BER performance reach the saturation but also the system computing operation are increasing according to the XWa$[ . Means, the complexity continue increasing function of XWa$[ but the BER performance will saturate. Just looking to numbers, the BER saturation is reached for 8 XWa$[ and the BER performance for 6 XWa$[ is same as the ordinary LLL. So, we got same performances as ordinary LLL with a gain of ¼ of operations. It’s a good performance vs complexity balance to be considered… 18 20 22 24 26 28 30 32 34 10 -4 10 -3 SNR per receiver antenna [average in dB] BER ML LLL-ZF Modif fCLLL-ZF (5Iter) Modif fCLLL-ZF (8Iter) Modif fCLLL-ZF (10Iter) LLL beter than 5 Iters 8 and 10 Iter are same 18 20 22 24 26 28 30 32 10 -4 10 -3 SNR per receiver antenna [average in dB] BER Modif fCLLL-ZF(4Iters) Modif fCLLL-ZF(5Iters) Modif fCLLL-ZF(6Iters) Modif fCLLL-ZF(7Iters) Modif fCLLL-ZF(8Iters) Modif fCLLL-ZF(9Iters) Modif LLL-ZF(18Iters) 8/9/18Iters curves are overlapping and 7Iters one is close to them (Red curve)
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 133 6. COMPLEXITY ANALYSIS In this section we will discuss the complexity aspect of our proposal and show the profits and benefits of our proposal. First we will give some details about the operation done by the algorithm. In [12] it was presented that a real matrix multiplication of A ( × $) and B (ª × $) leads to matrix C ( × $) and the overall operations are (ª − 1)$ addition operation and ª$ multiplication operations. It is also known that a complex addition is equivalent to two real addition operations. In fact, for the complex case we will add the real and imaginary parts separately. For the complex multiplication it is different and the operation can be written as below: ([ + «) ∗ (š + V) =	([š − «V) + («š + [V) =	([š − «V) + q-([ + «)(š + V) − [š − «V®	(16) The first options can be done in four multiplications and two additions (assuming that a subtraction can done via an addition operation). The second option can be done in three multiplications and five additions. But the first option is almost used. So, we will consider it. Also, in [13] it was shown that the different arithmetical operation requires different FLOPS. In the table below we present the number of FLOPS needs for each operation (for real values) [13]. Table 3: FLOPS vs operations Operation Add Mult Sqrt Div Nombre of FLOPS 1 1 8 8 • The size reduction require ( Ž•• − 2) × {1 × (¯°) + 2 × ($]`X + ±VV)} • The lovàsz condition require {4 × ($]`X) + 2 × (±VV)} • Colum swap require	× {3 × (±VV)} • The Givens rotation matrix computation require {2 × (¯°) + 2 × ($]`X) + 1 × (±VV) + 1 × ( •]aX)} • The rotation operation for R (matrix multiplication) require 2 × {2 × ($]`X) + 1 × (±VV)} × • The rotation operation for Q (matrix multiplication) require 2 × {2 × ($]`X) + 1 × (±VV)} × 2 • The CSflag condition sum require (±VV) Also, the complex division and square root operations consists of many real operations. • A square root of complex value require {1 × (¯°) + 3 × ($]`X) + 2 × (±VV) + 3 × ( •aX)} of real values. • A complex value division require {1 × (¯°) + 8 × ($]`X) + 4 × (±VV)} All these operation will be running in loop for Ž•• iterations for MIMO	8 × 8 scheme.
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 134 Table 4: Complexity gain FLOPS Performance Comments LLL algorithm Ž•• = ∞ 12200 < Wen’s algorithm [5] Ž•• = 18 < 9300 Outperform LLL (11dB at 108³ ) Our algorithm Ž•• = 18 < 9100 Gain 2dB at 108³ vs LLL The most important point is that we reach same performance with ~31% of FLOPS gainOur algorithm Ž•• = 8 < 6200 Gain 2dB at 108³ vs LLL Our algorithm Ž•• = 6 < 5800 Loose 2dB at 108³ vs LLL Gain ~36% of FLOPS and the performance degradation is minimal (2dB) The table above shows that with our proposal we can reach approximately the same performances as the LLL algorithm with reducing 36% of FLOPS. This is important in term of decoding delay. In fact, we can avoid some decoding delay and achieve the same performance with limited iteration number ( Ž•• = 8). 7. CONCLUSIONS In this paper we proposed some modifications to the FcLLL algorithm proposed by Wen [2]. Simulation results show that for 4 × 4 MIMO system, there is min and max values for the XWa$[ (5 to 8) where the BER performances seems to be good (more or less near to the original LLL results) and also the system complexity remains reasonable. Outside these limits the complexity vs performance balance become undesirable. And the extra iterations don't enhance the performance. Thus, to implement this algorithm we recommend an ideal value of XWa$[ = 6 which allows having a BER quite same as the original LLL and limits the iterations loop. In fact, with this recommended value we can gain ~36% of operations and the BER degradation will be ~2dB at 108³ . The challenge of our proposal was to not bring many changes to the original algorithm, but to identify the possible points that we can enhance in order to relax the processing operations and complexity while keeping good performance results (nearest to the original algorithm). Such study and the presented results aim to help the industry using a low complexity, low cost and high performance solution based on the LLL decoding technique. REFERENCES [1] A. K. Lenstra, H. W. Lenstra, and L. Lovàsz, (1982), ”Factoring polynomials with rational coefficients,” in Math. Ann, vol. 261, pp. 515 - 534. [2] Qingsong Wen, Qi Zhou, and Xiaoli Ma, (2014), “An Enhanced Fixed-Complexity LLL Algorithm for MIMO Detection”, Globecom 2014 - Signal Processing for Communications Symposium. [3] Y. H. Gan and W. H. Mow, (Dec. 2005) “Complex lattice reduction algorithms for low- complexityMIMO detection,” in Proc. IEEE Global Telecommun. Conf., St. Louis, MO, vol. 5, pp. 2953–2957. [4] C. P. Schnorr and M. Euchner, (1994) “Lattice Basis Reduction: Improved Practical Alorithms and Solving Subset Sum Problems”. Mathematical Programming, vol. 66, pp. 181.191. [5] Z. Ma, B. Honary, P. Fan, and E. Larsson, (Jun. 2009), “Stopping criterion for complexity reduction of sphere decoding,” IEEE Commun. Lett., vol. 13 , no. 6, pp. 402–404. [6] D. Wubben, D. Seethaler, J. Jalden, and G. Matz, (April 2011), ”Lattice reduction,” in IEEE Signal Processing Magazine, vol. 28, no. 4, pp. 70 - 91.
International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 135 [7] C. Ling, W. H. Mow, and N. Howgrave-Graham, (Mar. 2013), “Reduced and Fixed- Complexity Variants of the LLL Algorithm for Communications,” IEEE Trans. Commun., vol. 61, no. 3, pp. 1040–1050. [8] Md Hashem Ali Khan, Jin-Gyun Chung and Moon Ho Lee, (2015), “Lattice reduction aided with block diagonalization for multiuser MIMO systems”, EURASIP Journal on Wireless Communications and Networking 2015:254, DOI 10.1186/s13638-015-0476-1 [9] H. Vetter, V. Ponnampalam, M. Sandell, and P. A. Hoeher, (Apr. 2009), “Fixed complexity LLL algorithm,” IEEE Trans. Signal Process., vol. 57, no. 4, pp. 1634–1637. [10] B. Gestner, W. Zhang, X. Ma, and D. V.Anderson, (Apr. 2011) “. Lattice Reduction for MIMO Detection: FromTheoretical Analysis to Hardware Realization,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 4, pp. 1549-8328. [11] L. G. Barbero, D. L. Milliner, T. Ratnarajah, J. R. Barry, and C. F. N. Cowan, (Jun. 2009), “Rapid prototyping of Clarkson’s lattice reduction for MIMO detection,” in Proc. IEEE Int. Conf. Commun., Dresden, Germany, pp. 1–5. [12] Markus Bläser, (2013), “Fast Matrix Multiplication”, Theory of Computing, Graduate Surveys , vol. 5, p. 1-60 [13] Ameer Youssef , Mahdi Shabany , P. Glenn Gulak, (May 2011), “Performance analysis of lattice- reduction algorithms for a novel LR-compatible K-Best MIMO detector,” Conference: International Symposium on Circuits and Systems, DOI: 10.1109/ISCAS.2011.5937662, Rio de Janeiro, Brazil.

PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM

  • 1.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 DOI: 10.5121/ijcnc.2016.8309 123 PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM Nizar OUNI1 and Ridha BOUALLEGUE2 1 National Engineering School of Tunis, SUP’COM, InnovCom laboratory, Tunisia 2 SUP’COM, InnovCom laboratory, Tunisia ABSTRACT Multiple-input multiple-output (MIMO) systems are playing an increasing and interesting role in the recent wireless communication. The complexity and the performance of the systems are driving the different studies and researches. Lattices Reduction techniques bring more resources to investigate the complexity and performances of such systems. In this paper, we look to modify a fixed complexity verity of the LLL algorithm to reduce the computation operations by reducing the number of iterations without important performance degradation. Our proposal shows that we can achieve a good performance results while avoiding extra iteration that doesn’t bring much performance. KEYWORDS MIMO systems, LR-aided, Lattice, LLL, BER, Complexity. 1. INTRODUCTION MIMO communication systems are introduced to combat fading and provide high data rate. The MIMO system consists of transmitting multiple independent data symbols via multiple antennas. For the reception, a MIMO decoder needs to be used to detect, estimate, and reconstruct the received symbols. Multiple detection schemes can be used, such as the zero-forcing (ZF) or the minimum mean square error (MMSE) criterion. Also, the maximum likelihood decoder (ML) is considered as the optimal solution for the MIMO detection in term of Bit Error Rate (BER). But, unfortunately the ML algorithm seems to be complex for hardware implementations. Therefore, linear MIMO detection techniques like ZF and MMSE are better in term of complexity, but suffer from BER performance degradation. The lattice-reduction (LR) preprocessing technique has been proposed to be used with linear detection in order to transform the system model into an equivalent system with better channel matrix’s effect and so to reduce the complexity of the system. It was shown in previous studies that LR techniques improve the BER performances significantly. The populated LR algorithm is called Lenstra-Lenstra-Lovàsz (LLL) algorithm is the most used one. It was called according to the name of the inventors [1]. But, the LLL algorithm brings many challenges due to higher processing complexity and the undeterministic execution time [2]. LLL algorithm has a major limit which is the varying complexity that could be large and limits the decoding speed of the communication system. But, it is always presenting the best performance in term BER. The complex Lenstra-Lenstra-Lovàsz algorithm (CLLL) [3]is applying
  • 2.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 124 the basis reduction for complex field, while the LLL is targetinga real valued matrix. The different studies and simulation results show that CLLL requires less processing operations [4]. Effective LLL algorithm (ELLL) [5], come with a new idea that consists to change the Lovàsz reduction condition in order to relax the related equations. Also, the FcLLL prposed by Wen [2] reduces the number of iterations for the algorithm to fix iteration number instead of infinite iterations. This technique improves the complexity but remains worse than LLL in term of BER performance. In this paper we, will focus on the FcLLL algorithm using ZF decoding technique and we propose some modifications to the original FcLLL algorithm to keep a reduced number of loops and targeting a good BER. 2. SYSTEM MODEL DESCRIPTION During this paper we will consider that (. ) and (. ) denote respectively the hermission transpose and the transpose of a matrix. We consider the spatial multiplexing MIMO system with transmit and receive antennas with a Rayleigh channel non variant in the time. = . + (1) Where = [ , , … , ] ; (s ∊ s) is the information vector with being a constellation set of square QAM with [ ] = . and the real and imaginary parts are {−#$% + 1, … , −1, 1, … , #$% − 1} with M) being the constellation size, . We will suppose that the average transmit power of each antenna is normalized to one, so [ ] = . With I+ is the m × m identity matrix. is an × ; (N/ ≥ N1)complex channel matrix, x= [ , , … , x] is the received signal vector, and = [ , , … , 3 ]4 is the complex additive white Gaussian noise (AWGN) vector with zero mean and covariance 5. 3 . On the receiver side, = [ , , … , 3 ]4 are the symbols at receiver’s respective antennas which will be used to estimate transmitted e the symbols [4]. The receiver will analyze all received information to compute the transmitted data. So, a detection, computation, equalization and estimation of the received data will happen. At receiver side, the linear zero forcing (ZF) detector compute the inverse of the channel matrix to estimate the transmitted symbols which can be expressed by, s67 = ( . )8 .9:::;:::< =>>/?8@?A/>B? CB?DE>8 AF?/B? . x (2) The channel matrix is QR decomposed into two parts as = GH.
  • 3.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 125 Figure 1. MIMO system with Transmitter and Receiver antennas. 3. LATTICE REDUCTION TECHNIQUE We can interpret the columns ℎJof the channel matrix as the basis of a lattice and assume that the possible transmit vectors are given by ℤ+ , the m dimensional infinite integer space. Consequently, the set of all possible undisturbed received signals is given by the lattice. L( ) = L(ℎ , … , ℎM): = {∑ ℎJ M JP |ℎJ ∈ ℤ} (3) The LR algorithm generates a lattices reduced and near-orthogonal channel matrixS = . T. With matrix S = . T generates the same lattice as , if and only if the m × m matrix T is unimodular [6], i.e. T contains only integer entries and VWX(T) = ±1: L( S ) = L( ) ⇔ S = T[ VT ] ^_V]`[a (4) Also, S. T8 = (5) We can find multiple bases that can be included in the space L, and the goal of the LR algorithm is to find a set of least correlated base with the shortest basis vectors [2].Initially, an efficient (but supposed not optimal) way to determine a reduced basis was proposed by Lenstra, Lenstra and Lovàsz [1].Where they defined (LLL-Reduced): A basis S with QR decomposition S = Gb. Hb is called LLL-reduced with parameter δ with (1/4 < k ≤ 1), if mHbn,om ≤ . mHbn,nm p_a 1 ≤ < q ≤ ^ (6) And kmHbo8 ,o8 m ≤ mHbo,om + mHbo8 ,om p_a q = 2, … , ^ (7) The first condition is called, size-reduced and the second one is called Lovàsz condition. The parameter δ plays an important role to the quality of the reduced basis. We will assume δ = 3 4⁄
  • 4.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 126 as proposed in [1]. After applying the QR decomposition of H, doing successive size-reduces if the condition is fulfilled, the algorithm exchanges two vectors if Lovàsz condition is not fulfilled to generateT, compute RS andQS. And so, the LLL algorithm will output QS, RS andT. Looking to the LLL algorithm [1], one important element of its complexity is related to the fact that the LLL algorithm is applied for the real integer vectors, it is mandatory to reformulate the different matrices to their real-valued form, so we got: H/?xy = z Real(H) −Im(H) Im(H) real(H) • (8) x = z Real(x) Im(x) • (9) s = z Real(s) Im(s) • and n = z Real(n) Im(n) • (10) This kind of reformulation increases the number of operations and adds more latency for the system. The idea behind LR-aided linear detection is to consider the equivalent system model and perform the nonlinear quantisation on it [7]. In fact, if we combine equations (1) and (5), we can get: = S. T8 .9;< ‚ + (11) With ƒ = T8 . the equivalent model and in this case S will represent a better channel quality. And so, the detector can be represented with an equivalent model with better performance due to the less noise enhancement increased by S. Thus, the basic idea behind approximate lattice decoding (LD) is to use LR in conjunction with traditional low-complexity decoders. With LR, the basis B is transformed into a new basis consisting of roughly orthogonal vectors [8]. After processing the Zero Forcing lattice reduction (ZF-LR) mechanism and by combining equations (2) and (11), we can generate: z…678†‡ = T8 . s…67 = S. = ƒ + S. (12) The complex form of this algorithm was presented by Gan and Mow in [3]. But we can clearly identify that this extension keeps the excessive number of iteration and also add more computation latency by introducing the real and imaginary elements in the different conditions of the algorithm. For this reasons, Vetter proposed another variety of the complex LLL, than Ling [7] proposed a fixed complexity LLL (FCLLL). As recapitulation the modifications for the LLL algorithm was for three points: • Avoid the complex to real vector transformations (reduce the number of loops). • The reduction of the number of the LLL iteration to a fixed number. • The use of a flag to track column exchanges. When no column swap happens, the FCLLL ends with an LLL reduced basis. The different enhancements for the original algorithm where looking for limited iterations in term of stopping criteria, like in [2] and [5].
  • 5.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 127 In next section we will consider the proposal in [2] and start form the Wen’s algorithm as described in table 1 where, Wen proposed an enhanced form of Vetter’s algorithm. The proposal is based on an improved column traverse strategy and an enhanced termination criterion for practical LR-aided SIC MIMO detection. Table 1. The Fixed Complex LLL algorithm [2] Input: H: the channel complex matrix Output: Hb, Gb, T 1 Initialization: T = ; ; ‰ Š`[‹ = _ W (1, + 1); Œ W•; Ž••; 2 [G, H] ∶= •a( ); 3 k = 3 4⁄ 4 n ’ = 1 5 Œ W•n“•=1 6 ”ℎ`W ( n ’ ≤ Ž••)&& ( ]^(‰ p`[‹(2: 1: )) ≠ 0 7 Œ = Œ W•(Œ W•n“•) ; 8 ‰ p`[‹(Œ) = 0; 9 p_a Œ = 2: 10 ˜ ∶= (Hb(`, Œ) Hb(`, Œ)⁄ ) 11 p ˜ ≠ 0 12 Hb(1: `, Œ) ∶= Hb(1: `, Œ) − ˜ . Hb(1:`, `) 13 T(: , Œ) ∶= T(: , Œ) − ˜ . T(: , `) 14 W V 15 W V 16 p k. Hb(Œ − 1, Œ − 1) > Hb(Œ, Œ) + Hb(Œ − 1, Œ) 17 Œ X_ Œ − 1 š_`]^ ”[› p_a Hb [ V T 18 ‰_^›]X ‹ XℎW œ $[Xa : œ = z •ž Ÿ̅ −Ÿ • • ”Xℎ • = Hb(Œ − 1, Œ − 1) ¡Hb(Œ − 1: Œ, Œ − 1)¡ Ÿ = Hb(Œ, Œ − 1) ¡Hb(Œ − 1: Œ, Œ − 1)¡ 19 Hb(Œ − 1: Œ, Œ − 1: ^) ∶= œ. Hb(Œ − 1: Œ, Œ − 1: ^) 20 Gb(: , Œ − 1: Œ) ∶= Gb(: , Œ − 1: Œ). œ4 21 ‰ p`[‹(Œ − 1: 1: Œ + 1) = 1; 22 W V 23 W V 24 n ’ = n ’ + 1 25 Œ W•n“•=Œ W•n“• +1 26 W V
  • 6.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 128 4.MODIFIED ALGORITHM AND STUDY OF THE EFFECT OF MAX ITERATION ON THE FCLL ALGORITHM In this section we will start from the FCLLL algorithm proposed by Wen [2] and we will try to do modify it. In fact, for line 21 there is the table “ ‰ p`[‹” which is a condition for the loop as mentioned in line 6. The summation of the elements of this “table” seems to add − 1 more addition operations that need to be computed for each loop. So, for us, it will be better to come back to the single element condition as mentioned in [9]. A second remark, the Lovàsz condition such as described in line 16, is representing four complex multiplication, one addition operation and one subtraction operation (which can be considered as addition operation). All of them are complex and being running in loop. It will be better to use the Siegel condition which is always fulfilling the Lovàsz condition and we can go more to show that it reduce the computing operations [10] & [11]. The representation is below: mHbo8 ,o8 m ≤ ¢mHbo,om (13) Where ζ is chosen from[2, 4], To have the Siegel condition fulfilled, we have to check if: ¤ . mHbo8 ,o8 m > mHbo,om (14) Whatever the value of ζ, 2 or 4, we can get: k. mHbo8 ,o8 m > mHbo,om (15) Since δ > ¥ , So, we can modify the algorithm in table 1 to get the below one as described in table 2. Table 2: The proposed Modified Complex LLL algorithm Input: H: the channel complex matrix Outpu t: Hb, Gb, T 1 Initialization: T = ; = ƒW( , 2); ‰ Š`[‹ = 0; XWa$[ = 6; 2 [G, H] ∶= •a( ); 3 k = 3 4⁄ 4 XWa = 0 5 ”ℎ`W (‰ p`[‹ == 0) && ( XWa ≤ XWa$[ ) 6 ‰ p`[‹ =1; 7 XWa = XWa + 1; 8 p_a Œ = 2: 9 ˜ ∶= a_] V(Hb(`, Œ) Hb(`, Œ)⁄ ) 10 p ˜ ≥ 1 11 Hb(1: `, Œ) ∶= Hb(1: `, Œ) − ˜ . Hb(1: `, `)
  • 7.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 129 12 T(: , Œ) ∶= T(: , Œ) − ˜ . T(: , `) 13 W V 14 W V 15 p k. Hb(Œ − 1, Œ − 1) > Hb(Œ, Œ) 16 Œ X_ Œ − 1 š_`]^ ”[› p_a Hb [ V T 17 ‰_^›]X ‹ XℎW œ $[Xa : œ = z •ž Ÿ̅ −Ÿ • • ”Xℎ • = Hb(Œ − 1, Œ − 1) ¡Hb(Œ − 1: Œ, Œ − 1)¡ Ÿ = Hb(Œ, Œ − 1) ¡Hb(Œ − 1: Œ, Œ − 1)¡ 18 Hb(Œ − 1: Œ, Œ − 1: ^) ∶= œ. Hb(Œ − 1: Œ, Œ − 1: ^) 19 Gb(: , Œ − 1: Œ) ∶= Gb(: , Œ − 1: Œ). œ4 20 ‰ p`[‹ =0; 21 W V 22 Œ ∶= Œ + 1 23 W V 24 W V In the proposed algorithm we have modified the line 5 by avoiding “CSflag” table summation presented in table 1 and proposed in [2]. This will help to reduce additional processing operations which will help to “relax” the algorithm in term of complexity and decoding timing. In fact, and as described in the previous section, the contribution of the elementof this “table” in the algorithm doesn’t exceed the termination condition. The importance of this modification can be observed in the next sections, especially of the gain in terms of complexity. We can clearly observe that the max iteration number ( Ž••) is a condition to exit from the loop and so, it can increase the number of computation operations. This means, there is an ideal max iteration value that above it the system becomes exponential complex without large BER enhancement. Also, the modifications in line 10 and line 15 will help to reduce the processing operations because the algorithm will converge quicker than the initial version. In fact, the Siegel condition helps to relax the processing operations like presented in [10]. So, it’s interesting to evaluate the effect of the Max iteration on the BER performance and also the system complexity. For this, we tried to do the simulation of the algorithm with varying the value of the Max iteration. 5.SIMULATION RESULTS AND EFFECT OF THE MAX ITERATION ON THE FCLLL ALGORITHM For our simulation, we will consider the 16QAM constellation, ZF equalization will be checked. The MIMO model will be 4 × 4 ; means a 4 antennas at both transmitter and receiver side. We used a frame size of 10§ . We will indicate inline any changes to the above configuration. In the flowing figures, we tried to increase the max iteration number from 4 to 18.
  • 8.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 130 Figure 2: 4 × 4 Modified Complex LLL with 16QAM and ZF Bit error rate results. 5, 8 and 10 IterMax results compared to ML and ordinary LLL. Observing figure 2, the ML curve is outperforming all different curves. But we should note that the ML scheme is extremely complex to implement. So, we are indicating it just for reference and comparison reasons.Another quick remark is that comparing the FCLLL curves and the ordinary LLL algorithm we can see that for IterMax ≤ 5, the LLL is better comparing FCLLL. But for MaxIter equal to 5, the two curves are overlapping till SNR equal to 24dB and after the deviation is minimal. Which means that in terms of performances we are still in an acceptable range and so it will be interesting to push the analysis and also evaluate the gain in complexity and processing operations. In the figure 3, we increase the max iteration value from 4 to 18 to observe if any threshold value for this parameter; that allow to reach a better result with lower iterations. 0 5 10 15 20 25 30 35 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 SNR per receiver antenna [average in dB] BER ML LLL-ZF Modif fCLLL-ZF (5Iter) Modif fCLLL-ZF (8Iter) Modif fCLLL-ZF (10Iter) LLL beter than 5 Iters 8 and 10 Iter are same
  • 9.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 131 Figure 3: BER results by varying the IterMax: curves for IterMax equal to 4,5,6,7, 8,9 and 18. Looking to figure 3 we can observe that, staring from XWa$[ ≥ 6, the FCLLL become similar or better than the LLL. But, starting from 8 iteration we can observe that no improvement for the BER. Means, the curves remain overlapping each other. This leads us to conclude that no need to increase the XWa$[ parameter above 8 iterations. Else, the system became costly comparing to its performance. For XWa$[ between 5 and 8, we can push the analysis. In fact, the LLL algorithm as described in [1], will do a minimum of 2. loops; taking in consideration the fact that the size of the channel real-valued matrix H used for the LLL algorithm is double of the complex matrix H. Thus, for this case and with 8 IterMax we are exactly in the same condition as the LLL algorithm. From another point of view, a XWa$[ ≤ 4 will show a BER degradation. This is related to the fact that the algorithm will do a column swap for only half of the possible columns of the matrix. If we consider 5, 6 and 7 as IterMax we can see that we more or less close to the LLL algorithm, since the difference is observed only for the high SNR and the deviation is minimal. In the case of XWa$[ = 6 the BER curve is almost overlapping the ordinary LLL curve. From our point of view, using the XWa$[ equal to 6 seems to be the recommended value, since it has a good complexity to performance balance. Figures 4 and 5 show a zoom on the different curves to illustrate our analysis. 0 5 10 15 20 25 30 35 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 SNR per receiver antenna [average in dB] BER Modif fCLLL-ZF(4Iters) Modif fCLLL-ZF(5Iters) Modif fCLLL-ZF(6Iters) Modif fCLLL-ZF(7Iters) Modif fCLLL-ZF(8Iters) Modif fCLLL-ZF(9Iters) Modif LLL-ZF(18Iters) 8/9/18Iters curves are overlapping and 7Iters one is close to them (Red curve)
  • 10.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 132 Figure 4: 4 × 4 Modified Complex LLL with 16QAM and ZF Bit error rate results. 5, 8 and 10 IterMax Zoomed curves compared to ML and ordinary LLL. Figure 5: BER results by varying the IterMax: Zoomed curves for IterMax equal to 4,5,6,7, 8,9 and 18. We remark that starting from XWa$[ = 8 the system BER performance reach the saturation but also the system computing operation are increasing according to the XWa$[ . Means, the complexity continue increasing function of XWa$[ but the BER performance will saturate. Just looking to numbers, the BER saturation is reached for 8 XWa$[ and the BER performance for 6 XWa$[ is same as the ordinary LLL. So, we got same performances as ordinary LLL with a gain of ¼ of operations. It’s a good performance vs complexity balance to be considered… 18 20 22 24 26 28 30 32 34 10 -4 10 -3 SNR per receiver antenna [average in dB] BER ML LLL-ZF Modif fCLLL-ZF (5Iter) Modif fCLLL-ZF (8Iter) Modif fCLLL-ZF (10Iter) LLL beter than 5 Iters 8 and 10 Iter are same 18 20 22 24 26 28 30 32 10 -4 10 -3 SNR per receiver antenna [average in dB] BER Modif fCLLL-ZF(4Iters) Modif fCLLL-ZF(5Iters) Modif fCLLL-ZF(6Iters) Modif fCLLL-ZF(7Iters) Modif fCLLL-ZF(8Iters) Modif fCLLL-ZF(9Iters) Modif LLL-ZF(18Iters) 8/9/18Iters curves are overlapping and 7Iters one is close to them (Red curve)
  • 11.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 133 6. COMPLEXITY ANALYSIS In this section we will discuss the complexity aspect of our proposal and show the profits and benefits of our proposal. First we will give some details about the operation done by the algorithm. In [12] it was presented that a real matrix multiplication of A ( × $) and B (ª × $) leads to matrix C ( × $) and the overall operations are (ª − 1)$ addition operation and ª$ multiplication operations. It is also known that a complex addition is equivalent to two real addition operations. In fact, for the complex case we will add the real and imaginary parts separately. For the complex multiplication it is different and the operation can be written as below: ([ + «) ∗ (š + V) = ([š − «V) + («š + [V) = ([š − «V) + q-([ + «)(š + V) − [š − «V® (16) The first options can be done in four multiplications and two additions (assuming that a subtraction can done via an addition operation). The second option can be done in three multiplications and five additions. But the first option is almost used. So, we will consider it. Also, in [13] it was shown that the different arithmetical operation requires different FLOPS. In the table below we present the number of FLOPS needs for each operation (for real values) [13]. Table 3: FLOPS vs operations Operation Add Mult Sqrt Div Nombre of FLOPS 1 1 8 8 • The size reduction require ( Ž•• − 2) × {1 × (¯°) + 2 × ($]`X + ±VV)} • The lovàsz condition require {4 × ($]`X) + 2 × (±VV)} • Colum swap require × {3 × (±VV)} • The Givens rotation matrix computation require {2 × (¯°) + 2 × ($]`X) + 1 × (±VV) + 1 × ( •]aX)} • The rotation operation for R (matrix multiplication) require 2 × {2 × ($]`X) + 1 × (±VV)} × • The rotation operation for Q (matrix multiplication) require 2 × {2 × ($]`X) + 1 × (±VV)} × 2 • The CSflag condition sum require (±VV) Also, the complex division and square root operations consists of many real operations. • A square root of complex value require {1 × (¯°) + 3 × ($]`X) + 2 × (±VV) + 3 × ( •aX)} of real values. • A complex value division require {1 × (¯°) + 8 × ($]`X) + 4 × (±VV)} All these operation will be running in loop for Ž•• iterations for MIMO 8 × 8 scheme.
  • 12.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 134 Table 4: Complexity gain FLOPS Performance Comments LLL algorithm Ž•• = ∞ 12200 < Wen’s algorithm [5] Ž•• = 18 < 9300 Outperform LLL (11dB at 108³ ) Our algorithm Ž•• = 18 < 9100 Gain 2dB at 108³ vs LLL The most important point is that we reach same performance with ~31% of FLOPS gainOur algorithm Ž•• = 8 < 6200 Gain 2dB at 108³ vs LLL Our algorithm Ž•• = 6 < 5800 Loose 2dB at 108³ vs LLL Gain ~36% of FLOPS and the performance degradation is minimal (2dB) The table above shows that with our proposal we can reach approximately the same performances as the LLL algorithm with reducing 36% of FLOPS. This is important in term of decoding delay. In fact, we can avoid some decoding delay and achieve the same performance with limited iteration number ( Ž•• = 8). 7. CONCLUSIONS In this paper we proposed some modifications to the FcLLL algorithm proposed by Wen [2]. Simulation results show that for 4 × 4 MIMO system, there is min and max values for the XWa$[ (5 to 8) where the BER performances seems to be good (more or less near to the original LLL results) and also the system complexity remains reasonable. Outside these limits the complexity vs performance balance become undesirable. And the extra iterations don't enhance the performance. Thus, to implement this algorithm we recommend an ideal value of XWa$[ = 6 which allows having a BER quite same as the original LLL and limits the iterations loop. In fact, with this recommended value we can gain ~36% of operations and the BER degradation will be ~2dB at 108³ . The challenge of our proposal was to not bring many changes to the original algorithm, but to identify the possible points that we can enhance in order to relax the processing operations and complexity while keeping good performance results (nearest to the original algorithm). Such study and the presented results aim to help the industry using a low complexity, low cost and high performance solution based on the LLL decoding technique. REFERENCES [1] A. K. Lenstra, H. W. Lenstra, and L. Lovàsz, (1982), ”Factoring polynomials with rational coefficients,” in Math. Ann, vol. 261, pp. 515 - 534. [2] Qingsong Wen, Qi Zhou, and Xiaoli Ma, (2014), “An Enhanced Fixed-Complexity LLL Algorithm for MIMO Detection”, Globecom 2014 - Signal Processing for Communications Symposium. [3] Y. H. Gan and W. H. Mow, (Dec. 2005) “Complex lattice reduction algorithms for low- complexityMIMO detection,” in Proc. IEEE Global Telecommun. Conf., St. Louis, MO, vol. 5, pp. 2953–2957. [4] C. P. Schnorr and M. Euchner, (1994) “Lattice Basis Reduction: Improved Practical Alorithms and Solving Subset Sum Problems”. Mathematical Programming, vol. 66, pp. 181.191. [5] Z. Ma, B. Honary, P. Fan, and E. Larsson, (Jun. 2009), “Stopping criterion for complexity reduction of sphere decoding,” IEEE Commun. Lett., vol. 13 , no. 6, pp. 402–404. [6] D. Wubben, D. Seethaler, J. Jalden, and G. Matz, (April 2011), ”Lattice reduction,” in IEEE Signal Processing Magazine, vol. 28, no. 4, pp. 70 - 91.
  • 13.
    International Journal ofComputer Networks & Communications (IJCNC) Vol.8, No.3, May 2016 135 [7] C. Ling, W. H. Mow, and N. Howgrave-Graham, (Mar. 2013), “Reduced and Fixed- Complexity Variants of the LLL Algorithm for Communications,” IEEE Trans. Commun., vol. 61, no. 3, pp. 1040–1050. [8] Md Hashem Ali Khan, Jin-Gyun Chung and Moon Ho Lee, (2015), “Lattice reduction aided with block diagonalization for multiuser MIMO systems”, EURASIP Journal on Wireless Communications and Networking 2015:254, DOI 10.1186/s13638-015-0476-1 [9] H. Vetter, V. Ponnampalam, M. Sandell, and P. A. Hoeher, (Apr. 2009), “Fixed complexity LLL algorithm,” IEEE Trans. Signal Process., vol. 57, no. 4, pp. 1634–1637. [10] B. Gestner, W. Zhang, X. Ma, and D. V.Anderson, (Apr. 2011) “. Lattice Reduction for MIMO Detection: FromTheoretical Analysis to Hardware Realization,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 4, pp. 1549-8328. [11] L. G. Barbero, D. L. Milliner, T. Ratnarajah, J. R. Barry, and C. F. N. Cowan, (Jun. 2009), “Rapid prototyping of Clarkson’s lattice reduction for MIMO detection,” in Proc. IEEE Int. Conf. Commun., Dresden, Germany, pp. 1–5. [12] Markus Bläser, (2013), “Fast Matrix Multiplication”, Theory of Computing, Graduate Surveys , vol. 5, p. 1-60 [13] Ameer Youssef , Mahdi Shabany , P. Glenn Gulak, (May 2011), “Performance analysis of lattice- reduction algorithms for a novel LR-compatible K-Best MIMO detector,” Conference: International Symposium on Circuits and Systems, DOI: 10.1109/ISCAS.2011.5937662, Rio de Janeiro, Brazil.