I'm currently studying Pseudo Maximum Likelihood estimation. I'm trying to fit a GARCH model with Gaussian Pseudo Maximum Likelihood (and then non Gaussian), but before doing it on actual data I wanted to make sure it works with simulations. The parametrization I'm using is the one in Newey and Steigerwald (1997)
$$y_t=\sigma_0 v_t z_t=\epsilon_t$$ $$v_t^2=1+\alpha \epsilon_{t-1}^2+\beta v_{t-1}^2$$ Where $z_t$ is i.i.d. $D(0,1)$. I then wrote the log-likelihood function in Matlab
function [logL,gradlogL] = gllik(theta,y) %GLOGLIKELIHOOD Given a time series this function calculates the gaussian log %likelihood for a garch(1,1) process. the notation used is %yt=epsilon_t %epsilon_t=sigma0*sigma_t*z_t, z_t iid(0,1) %sigmat^2=1+alpha*(epsilon(t-1)^2)+beta*sigmat(t-1)^2 %theta(1)=sigma0^2; theta(2)=alpha; theta(3)=beta; T=size(y,1); logL=0; %initializing innovation eps=nan(T,1); eps(1)=0; %initializing conditional variance sigmatsq=nan(T,1); sigmatsq(1)=var(y); ztsq=nan(T,1); ztsq(1)=0; dgammavtsq=zeros(T,2); dgammalt=zeros(T,2); %Gradient declaration gradlogL=zeros(1,3); for t=2:T %calculating new means, volatilities and residuals sigmatsq(t)=1+theta(2)*eps(t-1)^2+theta(3)*sigmatsq(t-1); eps(t)=y(t); ztsq(t)=(eps(t))^2/(theta(1)*sigmatsq(t)); %calculating likelihood lt=-log(2*pi)/2-log(sigmatsq(t))/2-ztsq(t)/2-log(theta(1))/2; logL=logL+lt; %calculating gradient gradlogL(1)=gradlogL(1)+(2*theta(1))^(-1)*(ztsq(t)-1); dgammavtsq(t,:)=[eps(t-1)^2, sigmatsq(t-1)]; dgammalt(t,:)=(dgammavtsq(t,:)/(2*sigmatsq(t)))*(ztsq(t)-1); gradlogL(2:3)=gradlogL(2:3)+dgammalt(t,:); end logL=-logL/T; gradlogL=-gradlogL/T; end I return minus the function because then I want to maximize it using fmincon. I'm using the following options for the minimizer:
options = optimoptions('fmincon','Algorithm','interior-point','SpecifyObjectiveGradient',true,'MaxIterations', 1000, 'MaxFunEvals',500); And I'm imposing that $\alpha+\beta <1$ and the parameters are all positive. The problem is that the minimization gives me completely off estimates. In my latest simulation I used as parameters $\alpha=0.1$, $\beta=0.85$, $\sigma_0=0.9$ and simulated $N=50$ GARCH(1,1) Gaussian time series. Fmincon returns as averaged estimates $\hat{\sigma_0}=4.3896$, $\hat{\alpha}=0.0330$, $\hat{\beta}=0.5620$ (I did it with higher N too, with results that are no better).
It even seems to work slightly better when I simulate a t-student GARCH and then estimate it by Gaussian PMLE! Also with the "classical" parametrization it all seems to work fine.
The question: I'm pretty sure there's a thousand ways the code could be optimized (I'm unashamedly guilty of using too many for cicles...) but is there something I'm missing (like numerical problems or just a mistake in the code)?. Thanks in advance for the answers.