1
$\begingroup$

I am working on statistical inference with instrumental variables (IV) following Wooldridge (2016) Introductory Econometrics, Ch. 15. I am using the Card data set (like the book), with wages as outcome ($y$), education as a endogenous continuous treatment ($x$) and distance to college as a binary IV ($z$).

I want to calculate the standard errors manually, and preferably additionally in matrix form using Mata. So far, I am able to calculate coefficients but I can't seem to obtain the correct standard errors and would be happy for input on this.

I obtain the point estimate for $\beta_{IV}$ with the Wald-estimator:

$\beta_{IV}=\frac{\mathbb{E}[y | z = 1]-\mathbb{E}[y | z = 0]}{\mathbb{E}[x | z = 1]-\mathbb{E}[x | z = 0]}$,

$\beta_{IV}=\frac{6.311401-6.155494}{13.52703-12.69801}=.18806$

Cross-checked with Stata's -ivregress-:

. ivregress 2sls y (x=z), nohe ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .1880626 .0262826 7.16 0.000 .1365496 .2395756 _cons | 3.767472 .3487458 10.80 0.000 3.083943 4.451001 ------------------------------------------------------------------------------ 

I now want to proceed by calculating the standard errors. Wooldridge (2016, p. 466) writes that standard errors for $\beta_{IV}$ is obtained by using the square root of the estimated asymptotic variance, where the latter is obtained by

$Var(\beta_{IV})=\frac{\sigma^{2}}{SST_{x} \cdot R^{2}_{x,z}}$

First, $SST_{x}$ is the total sum of squares for $x_{i}$, calculated by

. use http://pped.org/card.dta, clear // Load Card data set . rename nearc4 z . rename educ x . rename lwage y . * SSTx . egen x_bar = mean(x) . gen SSTx = (x-x_bar)^2 . quiet sum SSTx . di r(sum) 21562.08 

Second, $R^{2}_{x,z}$ is obtained from the regression output,

. reg x z, nohe ------------------------------------------------------------------------------ x | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- z | .829019 .1036988 7.99 0.000 .6256912 1.032347 _cons | 12.69801 .0856416 148.27 0.000 12.53009 12.86594 ------------------------------------------------------------------------------ . di .829^2 .687241 

Finally, $\sigma^{2}$ is the error variance given by $SSE/(n-k-1)$ where the squared estimate of errors (SSE) is obtained by $SSE = \sum{(y_{i}-\hat{y_{i}})^{2}}$. Wooldridge says to use the IV residuals $\hat{u_{i}}$ in calculating the error variance,

$\sigma^{2}=\frac{1}{(n-2)} \sum{\hat{u_{i}}^2}$

Which I calculate in Stata as,

. quiet reg x z . predict x_hat (option xb assumed; fitted values) . quiet reg y x_hat, nohe . predict iv_resid (option xb assumed; fitted values) . quiet sum iv_resid . di r(sum) 18848.115 . di (18848.114)^2 3.553e+08 . gen sigma_squared = 3.553e+08 . tabstat sigma_squared, format(%20.2f) variable | mean -------------+---------- sigma_squa~d | 355300000.00 ------------------------ . di (1/(3010-2))*355300000 118118.35 

Thus, when finally I substitute the values into the equation for the variance of $\beta_{IV}$, I get

$Var(\beta_{IV})=\frac{118118.35}{21562.08 \cdot .687241}=7.9711$

I then calculate the standard error by following the formula for standard error (e.g. Wooldridge 2016, p. 50):

$\hat{\sigma} = \sqrt{\hat{\sigma}^{2}} \implies \sqrt{7.9711}=2.8233$

$se(\beta_{IV})=\frac{\sigma}{\sqrt{SST_{x}}} \implies \frac{2.8233}{\sqrt{21562.08}}=0.01922 $

I have used quite some time on this and it would really be helpful with some input on what I am doing wrong.

EDIT: Based on the formula provided by Drunk Deriving, I've tried to calculate SE in Mata

. use http://pped.org/card.dta, clear . keep nearc4 educ lwage id . rename nearc4 Z . rename educ X . rename lwage y . mata : y=st_data(.,"y") : X=st_data(.,"X") : Z=st_data(.,"Z") : X = X, J(rows(X),1,1) // Add constant : Z = Z, J(rows(Z),1,1) // Add constant : b_iv = luinv(Z'*X)*Z'*y : e=y-X*b_iv : v=luinv(Z'*X)*Z'e*e'*Z*luinv(Z'*X) : xmean = mean(X) : tss_x = sum((X :- xmean) :^ 2) : se=sqrt(v)/tss_x : t=b_iv:/se : p=2*ttail(rows(X)-cols(X),abs(t)) : b_iv,se,t,p 1 2 3 4 5 6 7 +---------------------------------------------------------------------------------------------------+ 1 | .1880626042 . 1.69178e-17 . 1.11162e+16 . 0 | 2 | 3.767472015 4.17102e-17 . 9.03251e+16 . 0 . | +---------------------------------------------------------------------------------------------------+ : end 
$\endgroup$

3 Answers 3

2
$\begingroup$

HEre it is an option

use http://pped.org/card.dta, clear keep nearc4 educ lwage id rename nearc4 z rename educ x rename lwage y bysort z: sum y x gen byte one=1 mata: y=st_data(.,"y") x=st_data(.,"x one") z=st_data(.,"z one") xh=z*invsym(z'*z)*z'*x biv=invsym(xh'*xh)*xh'*y biv2=luinv(z'*x)*z'*y //residuals re=y-x*biv vcv=sum(re:^2)/(rows(y))*invsym(xh'*xh) vcv end ivregress 2sls y (x=z), matrix list e(V) 

the main difference with your previous code is how errors are defined (re=y-x*biv) and that, ivregress Stata does not adjust for degrees of freedom. otherwise if you use the following:

mata:sum(re:^2)/(rows(y)-2)*invsym(xh'*xh) 

you need to compare it to

ivregress 2sls y (x=z), small 
$\endgroup$
1
$\begingroup$

Since this is just identified, the formula is pretty straight forward. Let $X$ be the matrix of the independent variables, $Z$ is the matrix of instruments, and $e$ be vector or errors, then $$Var(\beta_{IV})=(Z’X)^{-1}Z’ee’Z(Z’X)^{-1}.$$

$\endgroup$
1
  • $\begingroup$ Thank you Drunk Deriving! I have edited the question to include how I am trying to solve this in Mata code. Would you have any input on how I could edit my code to obtain correct SE's? If you see where it goes wrong in my original manual approach, that would be great too. I plan to use this for instructional purposes in the future, and I think the step-by-step approach may be more intuitive than matrix form for some audiences. $\endgroup$ Commented Jun 15, 2020 at 5:15
0
$\begingroup$

Thank you for your immensely helpful reply @Fcold. I was hoping someone could point out where my code was mistaken. To be on the sure side, I just want to repeat the code in matrix form so I understand it correctly:

(1) Obtain predicted $x$-values from the first stage:

$\hat{X}=Z(Z'Z)^{-1}Z'X$

(2.a.) Obtain IV-coefficients:

$\beta_{IV}=(\hat{X}'\hat{X})^{-1}\hat{X}y$

(2.b.) Alternatively, use:

$\beta_{IV2}=(Z'X)^{-1}Z'y$

(3) Calculate residuals:

$\hat{u}=y-X\beta_{IV}$

(4) Calculate the variance-covariance matrix:

$C= \frac{\sum\hat{u}^{2}}{n(\hat{X}'\hat{X})^{-1}}$

(5) Obtain standard errors for coefficients:

$se(\beta_{IV})=\sqrt{C}$

I added the last part as I see this provides the correct standard errors, but please correct me if I'm wrong.

gen byte one=1 mata: y=st_data(.,"y") x=st_data(.,"x one") z=st_data(.,"z one") xh=z*invsym(z'*z)*z'*x biv=invsym(xh'*xh)*xh'*y biv2=luinv(z'*x)*z'*y //residuals re=y-x*biv vcv=sum(re:^2)/(rows(y))*invsym(xh'*xh) se=sqrt(vcv) t=(biv:/se) end 
$\endgroup$
2
  • 1
    $\begingroup$ yes this is correct. with just 1 minor point. $C=n^{-1}\sum{\hat{u}^2 * (\hat{X}'\hat{X})^{-1} }$. Also, in retrospect, I think your code was correct for the equivalent to "robust" standard errors. $\endgroup$ Commented Jun 16, 2020 at 15:36
  • $\begingroup$ Thanks for your input which really helped in getting a better grasp of this. $\endgroup$ Commented Jun 16, 2020 at 19:20

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.