Help in Extending Least squares for sparse coefficients

Question

My observation y is obtained from the model, $y(n) = \sum_{i=0}^{p-1} r(i) x(n-i) + v(n)$ where r is the sparse channel coefficients, x is the one dimensional input and v is additive White Gaussian noise of zero mean. y = filter(.) command is used to model the above equation and thus creating an FIR filter or a moving average (MA) model. The order of the MA model is p=3.

So, $y = [y(1),y(2),....,y(100)]$ is a vector of 100 elements. I am generating noise of variance 0.1 I want to estimate the sparse channel coefficients using LASSO. As there are p channel coefficients, I should get p estimates.

According to the equation of LASSO, ||rx - y||_2^2 + lambda * ||r||_1 I am estimating the sparse coefficients, r. As the true coefficient array contains p elements, I should get p estimated elements. I am not quite sure if this is the way to do. I have not found any example on LASSO being applied to univariate time series model such as ARMA. I don't know how to estimate the sparse coefficients using the appropriate algorithm and need help.

The first part of the Equation : $||rx - y||_2^2$ is a least squares formulation which I can solve using Least Squares Approach. In order to implement LS, I have to arrange the input in terms of regressors. However, if the coefficients, $\mathbf{r}$ are sparse then I should use LASSO approach. I have tried using Matlab's LASSO function. For LASSO, I rearranged the input data $x$ in terms of regressors, but I don't know if this the correct approach.

I need help. Is there an approach to include the sparsity term in the LS?

Please find below the code for LASSO using Matlab function. As a toy example I am just assuming model order to be of lag 3 but I know that LASSO can be applied efficiently to a large model. I can test for larger order MA model having lag > 3.

% Code for LASSO estimation technique for %MA system, L = 3 is the order, %Generate input x = -5:.1:5; r = [1 0.0 0.0];% L elements of the channel coefficients %Data preparation into regressors X1 = [ ones(length(x),1) x' x']; %first column treated as all ones since x_1=1 y = filter(r,1,x); % Generate the MA model [r_hat_lasso, FitInfo] = lasso(X1, y, 'alpha', 1, 'Lambda', 1, 'Standardize', 1);

OUTPUT :

The estimates returned are r_hat_lasso = 0, 0.657002829714982, 0

Question : This differs very much from the actual r. Is my understandin wrong?

UPDATE : Based on the answer, I have tried to apply LASSO to a large MA model having 89 lags. I am trying to find out lambda using cross-validation. I have split the data into training and hold out sample denoted by the variables iTr and iHo respectively. I want to calculate the mean square prediction error between the the hold out samples in y and the predicted y obtained using the estimates. I am getting wrong values for MSE and unable to understand which estimated coefficients to use with the hold out samples in the lines [r_hat_lasso, FitInfo] = lasso(X1(iTr,1:end), y(iTr));

[rhatLASSO,stats] = lasso(X1(iTr,2:end),y(iTr),'CV',10);

yLasso = X1(iHo,:)*rLasso;

I am getting NaN as the error. Need help with the correct method to use cross validation and calculate the MSE? The code is below:

clear all clc %Generate input N=200; x=(randn(1,N)*100); L = 90; Num_lags = 1:89; r = 1+randn(L,1); %Data preparation into regressors r(rand(L,1)<.7)=0; % 70 of the coefficients are zero X1 = lagmatrix(x, [0 Num_lags]); y=X1*r ; % %Estimation iTr = rand(N,1)<0.5; %training iHo = ~iTr; % holdout % %LASSO [r_hat_lasso, FitInfo] = lasso(X1(iTr,1:end), y(iTr)); [rhatLASSO,stats] = lasso(X1(iTr,2:end),y(iTr),'CV',10); % %Picking the hyper parameter, lambda lassoPlot(rhatLASSO,stats,'PlotType','CV'); rLasso = [stats.Intercept(stats.Index1SE);rhatLASSO(:,stats.Index1SE)]; stats.Index1SE % % %ans = % % % 87 % %Evaluate predictions on holdout samples yLasso = X1(iHo,:)*rLasso; % %Assess prediction error fprintf('---MSE in holdout sample---\n'); fprintf('MSE LASSO: %f\n',mean((y(iHo)-yLasso).^2));

OUTPUT : MSE LASSO: NaN

H. Rev. · Accepted Answer · 2017-05-04 15:37:49Z

First of all:

Is there an approach to include the sparsity term in the LS?

Well you wrote the cost function correctly: ||rx - y||_2^2 + lambda * ||r||_1, this is usually solved using a method called coordinate descent, which Matlab uses in LASSO function. Other than that, you cannot use a closed form analytical solution as in the Least square case (also known as Ordinary Least Square, and the method is by using the pseudo-inverse).

Now to answer your actual question, yes you did something wrong in how you are building the design matrix (X1 in your code). This matrix should contain for each row the value of each regressor namely for the nth row:

first column should be x(n)
second column should be x(n-1)
third column x(n-2)

So you will have undefined values for n < 3, you can replace them with nan or 0, or discard them to be valid.

To create this matrix:

X1 = lagmatrix(x, [0 1 2]); % If you also want to have intercept uncommment following line: % X1 = [ones(length(x),1) X1];

This is the matrix you need to find coefficient of such models (ARMA). It contains one column for x, and the other for shifted versions of x. The maximum lag shift is the order of your MA model.

NOTE: The matrix X1 will then contain rows with NaN, usually you are better off removing them. Though here, the Matlab lasso function deals with it.

Then fit for several values of lambda, using default option:

[r_hat_lasso, FitInfo] = lasso(X1, y);

r_hat_lasso will contain values of r for 39 different lambdas in this case. And you want to chose the one with the lowest MSE (look into FitInfo.MSE, first element will be lowest here: 0.0069). Corresponding lambda is FitInfo.Lambda(1) = 0.0833. And:

r_hat_lasso(:,1) % ans = % 0.9708 % 0.0000 % 0.0000

Or simply use the correct Lambda directly:

[r_hat_lasso, FitInfo] = lasso(X1, y, 'alpha', 1, 'Lambda', 0.0833, 'Standardize', 1);

Finally to solve this same problem with least-square you can simply do:

% Removing NaNs: X1(1:2,:) = []; y(1:2) = []; % Now LS solution using psuedo-inverse, here with Matlab's backslash: r_hat_ls = X1\y(:);

Will give you r_hat_ls = [1 0 0].

UPDATE:

I also forgot that the lasso function add intercept term automatically so you do not have to X1 = [ones(length(x),1) X1]; before calling lasso. I would advise to zero-mean your y anyway, such that any intercept would be 0.

Regarding your NaN, it comes from the fact that X1 contains rows with NaN (89 in your case), so y will also contains these at the beginning. Basically you have to imagine that you are looking at predictors that spann time 0 to t=0-89, where obviously you do not know what comes before 0 so the first 89 values are dummy. Simply do:

%Generate input N=200; x=(randn(1,N)*100); L = 90; Num_lags = 1:L-1; r = 1+randn(L,1); %Data preparation into regressors r(rand(L,1)<.7)=0; % 70 of the coefficients are zero X1 = lagmatrix(x, [0 Num_lags]); % NOW REMOVING NANS: X1(1:L-1,:) = []; y=X1*r ; % Note that now: length(y) = N - 89 %Estimation iTr = rand(N-L+1,1)<0.5; %training iHo = ~iTr; % holdout %LASSO [rhatLASSO,stats] = lasso(X1(iTr,:),y(iTr),'CV',10); % also 'CV' here does the cross validation for you so you do not really have to redo it... %Picking the hyper parameter, lambda lassoPlot(rhatLASSO,stats,'PlotType','CV'); % To use intercept X1_withIntercept = [ones(length(y),1) X1]; rLasso_withIntercept = [stats.Intercept(stats.Index1SE);rhatLASSO(:,stats.Index1SE)]; %Evaluate predictions on holdout samples yLasso1 = X1_withIntercept(iHo,:)*rLasso_withIntercept; % Or without intercept: yLasso2 = X1(iHo,:)*rhatLASSO(:,stats.Index1SE); %Assess prediction error fprintf('---MSE in holdout sample---\n'); fprintf('MSE LASSO with Intercept: %f\nMSE LASSO without Intercept: %f\n',mean((y(iHo)-yLasso1).^2), mean((y(iHo)-yLasso2).^2));

Also consider having N large enough to train meaningfully (9 tenth, if you do 10-fold cross validation, of N should be significantly larger than your number of predictors, L).

Thank you for your reply, a lot of concepts are cleared based on your explanation. However, there are 2 things which I could not follow properly - (1) how to correclty apply the estimates corresponding to the lowest MSE (2) how to use cross validation and evaluate the MSE performance. I have extended your code to a larger MA model presented uder the Update part of the Question. I am getting incorrect result. Could you please help in correctly applying the estimated coefficients so that I can obtain the MSE between y and yLASSO? — SKM
– SKM, Commented May 3, 2017 at 18:08
I think you got a bit confused, first why don't you use the first column of X1 (by calling lasso on X1(iTr,2:end))? Finally as I said you will have NaN in X1 (for all the rows where negative lags do not exist! i.e. first 88 rows for numLags=89), so when you compute the prediction yLasso, you might have some NaN left, then the MSE, which calls mean will be NaN. Try using nanmean — H. Rev.
– H. Rev., Commented May 4, 2017 at 15:00
Than you for the revised code. I ran the updated code but the MSE is very high : MSE LASSO with Intercept: 99910.739554 MSE LASSO without Intercept: 88108.648107 I don't think MSE would be that high. Did you get this high MSE? — SKM
– SKM, Commented May 4, 2017 at 17:08
When I decrease the lag to say `L=10' then the MSE is lower. But for higher lags the MSE performance is so high. Can you please exxplain this? — SKM
– SKM, Commented May 4, 2017 at 17:31
Well, normally you would normalize your MSE differently if you have more or less predictors (e.g. (y.^2-ypred.^2)/(n-L) where n is number of samples and L number of predictors) otherwise you cannot really compare (see here) then, Yes: it is absolutely normal to get such high values, since the variance of your y is roughly of that order (1e5). Plot y(iHo) and yLasso together, they're almost identical... If you want a measure of accuracy that is more intuitive use the coefficient of determination (pretty much a percent..) — H. Rev.
– H. Rev., Commented May 4, 2017 at 18:53

Stack Exchange Network

Help in Extending Least squares for sparse coefficients

1 Answer 1

Hot Network Questions

Help in Extending Least squares for sparse coefficients

1 Answer 1

Related

Hot Network Questions