3
$\begingroup$

I have the following data frame:

> str(merged_data) tibble [497 × 16] (S3: tbl_df/tbl/data.frame) $ date : 'yearqtr' num [1:497] 2003 Q3 2003 Q4 2004 Q1 2004 Q2 ... $ country : chr [1:497] "DK" "DK" "DK" "DK" ... $ loans : num [1:497] 2114842 2140422 2175715 2252779 2271740 ... $ gdp : num [1:497] 359251 366526 371905 373751 378360 ... $ m1 : num [1:497] 467017 476257 482687 496729 505153 ... $ m3 : num [1:497] 815689 854512 888655 921256 923625 ... $ M1_growth_rate : num [1:497] 0.0206 0.0198 0.0135 0.0291 0.017 ... $ M3_growth_rate : num [1:497] 0.03511 0.04759 0.03996 0.03669 0.00257 ... $ GDP_growth_rate : num [1:497] 0.01013 0.02025 0.01468 0.00496 0.01233 ... $ loans_growth_rate : num [1:497] 0.00219 0.0121 0.01649 0.03542 0.00842 ... $ turnover : num [1:497] 30265 30171 45944 39051 38377 ... $ mkt_cap : num [1:497] 626315 686154 730365 714867 759215 ... $ mkt_cap_growth_rate : num [1:497] 0.2292 0.0955 0.0644 -0.0212 0.062 ... $ lag_loans_growth_rate : num [1:497] 0.02079 0.00219 0.0121 0.01649 0.03542 ... $ lag_mkt_cap_growth_rate: num [1:497] 0.106 0.2292 0.0955 0.0644 -0.0212 ... $ region : num [1:497] 0 0 0 0 0 0 0 0 0 0 ... 

I estimate the following IV regression:

model1 <- ivreg(GDP_growth_rate ~ loans_growth_rate + mkt_cap_growth_rate + turnover + factor(region) | turnover + factor(region) + lag_loans_growth_rate + lag_mkt_cap_growth_rate, data = merged_data) 

where lag_loans_growth_rate and lag_mkt_cap_growth_rate are instruments for loans_growth_rate and mkt_cap_growth rate, respectively.

Following this example, to test if my instruments correlate with the error term in regression for GDP_growth_rate, I estimate the following linear model:

model1_residuals <- residuals(model1) model_residual_regression <- lm(model1_residuals ~ turnover + factor(region) + lag_loans_growth_rate + lag_mkt_cap_growth_rate, data = merged_data) 

in which I include exogenous variables to GDP_growth_rate + instruments as independent variables and residuals from the IV model as a dependent variable.

What I get is the following:

> summary(model_residual_regression) Call: lm(formula = model1_residuals ~ turnover + factor(region) + lag_loans_growth_rate + lag_mkt_cap_growth_rate, data = merged_data) Residuals: Min 1Q Median 3Q Max -0.4292 -0.0224 0.0012 0.0253 0.2529 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -6.10e-18 4.87e-03 0 1 turnover 5.66e-23 2.69e-08 0 1 factor(region)1 4.51e-18 6.12e-03 0 1 lag_loans_growth_rate 8.07e-17 5.86e-02 0 1 lag_mkt_cap_growth_rate 5.05e-17 2.26e-02 0 1 Residual standard error: 0.0569 on 492 degrees of freedom Multiple R-squared: 5.09e-32, Adjusted R-squared: -0.00813 F-statistic: 6.26e-30 on 4 and 492 DF, p-value: 1 

Suspiciously low estimates and t-values. Although I need low t-values and high p-values to prove low correlation with the error term, this looks too extreme. The residuals don't seem to have any trend and the plot looks as follows:

enter image description here

Could someone, please, explain why I get so low t-values? Is the problem in scaling or I am missing something and doing something wrong?

Thanks in advance.

UPDATE [11/04/25]

Expressing turnover in growth rate form (as other variables) doesn't change the weird output of the model.

> str(merged_data) gropd_df [469 × 18] (S3: grouped_df/tbl_df/tbl/data.frame) $ date : 'yearqtr' num [1:469] 2004 Q2 2004 Q3 2004 Q4 2005 Q1 ... $ country : chr [1:469] "DK" "DK" "DK" "DK" ... $ loans : num [1:469] 2252779 2271740 2311587 2387448 2456384 ... $ gdp : num [1:469] 373751 378360 385659 387169 396875 ... $ m1 : num [1:469] 496729 505153 525616 546185 582046 ... $ m3 : num [1:469] 921256 923625 902528 855492 880755 ... $ M1_growth_rate : num [1:469] 0.0291 0.017 0.0405 0.0391 0.0657 ... $ M3_growth_rate : num [1:469] 0.03669 0.00257 -0.02284 -0.05212 0.02953 ... $ GDP_growth_rate : num [1:469] 0.00496 0.01233 0.01929 0.00392 0.02507 ... $ loans_growth_rate : num [1:469] 0.03542 0.00842 0.01754 0.03282 0.02887 ... $ turnover : num [1:469] 39051 38377 45439 52857 64825 ... $ mkt_cap : num [1:469] 714867 759215 802361 857527 895205 ... $ mkt_cap_growth_rate : num [1:469] -0.0212 0.062 0.0568 0.0688 0.0439 ... $ turnover_growth_rate : num [1:469] -0.15 -0.0173 0.184 0.1633 0.2264 ... $ diff_turnover_growth_rate: num [1:469] -0.6728 0.1328 0.2013 -0.0208 0.0632 ... $ lag_loans_growth_rate : num [1:469] 0.02079 0.00219 0.0121 0.01649 0.03542 ... $ lag_mkt_cap_growth_rate : num [1:469] 0.106 0.2292 0.0955 0.0644 -0.0212 ... $ region : num [1:469] 0 0 0 0 0 0 0 0 0 0 ... - attr(*, "groups")= tibble [7 × 2] (S3: tbl_df/tbl/data.frame) ..$ country: chr [1:7] "DK" "EE" "FI" "LT" ... ..$ .rows : list<int> [1:7] .. ..$ : int [1:58] 1 2 3 4 5 6 7 8 9 10 ... .. ..$ : int [1:78] 59 60 61 62 63 64 65 66 67 68 ... .. ..$ : int [1:72] 137 138 139 140 141 142 143 144 145 146 ... .. ..$ : int [1:70] 281 282 283 284 285 286 287 288 289 290 ... .. ..$ : int [1:72] 209 210 211 212 213 214 215 216 217 218 ... .. ..$ : int [1:62] 351 352 353 354 355 356 357 358 359 360 ... .. ..$ : int [1:57] 413 414 415 416 417 418 419 420 421 422 ... .. ..@ ptype: int(0) ..- attr(*, ".drop")= logi TRUE > model1 <- merged_data %>% + drop_na() %>% + ivreg(GDP_growth_rate ~ turnover_growth_rate + loans_growth_rate + mkt_cap_growth_rate | + turnover_growth_rate + lag_loans_growth_rate + lag_mkt_cap_growth_rate, data = .) > model1_residuals <- model1$residuals > model_residual_regression <- lm(model1_residuals ~ turnover_growth_rate + lag_loans_growth_rate + lag_mkt_cap_growth_rate, data = merged_data) > summary(model_residual_regression) Call: lm(formula = model1_residuals ~ turnover_growth_rate + lag_loans_growth_rate + lag_mkt_cap_growth_rate, data = merged_data) Residuals: Min 1Q Median 3Q Max -1.07599 -0.08666 -0.01047 0.08279 0.94846 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.087e-17 9.385e-03 0 1 turnover_growth_rate 2.515e-17 2.069e-02 0 1 lag_loans_growth_rate -6.014e-16 1.899e-01 0 1 lag_mkt_cap_growth_rate 1.843e-16 7.427e-02 0 1 Residual standard error: 0.1845 on 465 degrees of freedom Multiple R-squared: 5.088e-32, Adjusted R-squared: -0.006452 F-statistic: 7.887e-30 on 3 and 465 DF, p-value: 1 

Here is the summary output of the original model and the correlation matrix of the original variables and their instruments.

> print( + cor(merged_data[, c("loans_growth_rate", + "lag_loans_growth_rate", + "mkt_cap_growth_rate", + "lag_mkt_cap_growth_rate")], + use = "complete.obs") + ) loans_growth_rate lag_loans_growth_rate mkt_cap_growth_rate lag_mkt_cap_growth_rate loans_growth_rate 1.00000000 0.28068506 -0.05253419 0.07145365 lag_loans_growth_rate 0.28068506 1.00000000 -0.08958613 -0.04203985 mkt_cap_growth_rate -0.05253419 -0.08958613 1.00000000 0.02117171 lag_mkt_cap_growth_rate 0.07145365 -0.04203985 0.02117171 1.00000000 > model1 <- merged_data %>% + drop_na() %>% + ivreg(GDP_growth_rate ~ turnover_growth_rate + loans_growth_rate + mkt_cap_growth_rate | + turnover_growth_rate + lag_loans_growth_rate + lag_mkt_cap_growth_rate, data = .) > summary(model1) Call: ivreg(formula = GDP_growth_rate ~ turnover_growth_rate + loans_growth_rate + mkt_cap_growth_rate | turnover_growth_rate + lag_loans_growth_rate + lag_mkt_cap_growth_rate, data = .) Residuals: Min 1Q Median 3Q Max -1.07599 -0.08666 -0.01047 0.08279 0.94846 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.01792 0.06012 -0.298 0.766 turnover_growth_rate -0.18256 0.36985 -0.494 0.622 loans_growth_rate 1.17900 2.41297 0.489 0.625 mkt_cap_growth_rate 1.70677 3.36082 0.508 0.612 Residual standard error: 0.1845 on 465 degrees of freedom Multiple R-Squared: -61.78, Adjusted R-squared: -62.19 Wald test: 0.1053 on 3 and 465 DF, p-value: 0.957 
$\endgroup$
8
  • 1
    $\begingroup$ Welcome to Cross Validated! I am concerned about your Although I need low t-values and high p-values to prove low correlation with the error term comment, as high p-values can mean a small effect size or an inadequate sample size to detect an effect that is sufficiently large to be interesting. $\endgroup$ Commented Apr 10 at 19:47
  • $\begingroup$ @Dave I also thought about potentially small sample size problem, but the author of the example I refer to in the question, uses the sample with 50 observations and manages to get adequate results of the same type of model. $\endgroup$ Commented Apr 10 at 19:56
  • $\begingroup$ low estimates - your response is << 1, but your prediction is in the thousands. You may (I am not sure) be running into some numerical/fitting/scale issues, and might need to rescale some of your variables so they are more similar orders of magnitude $\endgroup$ Commented Apr 11 at 6:02
  • $\begingroup$ I would like to see some summary output for the original model ... I'm also wondering if it might be an issue that the lag variables are probably very correlated with the original variables. $\endgroup$ Commented Apr 11 at 6:54
  • $\begingroup$ @AlexJ I hope you're right! I expressed turnover in growth rate form (same as other variables and removed outlier), but it didn't change anything :( I updated the question with the new output. $\endgroup$ Commented Apr 11 at 11:57

1 Answer 1

2
$\begingroup$

The problem is that you have a just-identified model, not an over-identified model. A model is over identified if it has more instruments than endogenous regressors. The example you are trying to follow uses 2 instruments (salestaxdiff, cigtaxdiff) for 1 endogenous regressor (pricediff).

library(AER) data("CigarettesSW") c1985 <- subset(CigarettesSW, year == "1985") c1995 <- subset(CigarettesSW, year == "1995") packsdiff <- log(c1995$packs) - log(c1985$packs) pricediff <- log(c1995$price/c1995$cpi) - log(c1985$price/c1985$cpi) incomediff <- log(c1995$income/c1995$population/c1995$cpi) - log(c1985$income/c1985$population/c1985$cpi) salestaxdiff <- (c1995$taxs - c1995$tax)/c1995$cpi - (c1985$taxs - c1985$tax)/c1985$cpi cigtaxdiff <- c1995$tax/c1995$cpi - c1985$tax/c1985$cpi cig_ivreg_diff3 <- ivreg(packsdiff ~ pricediff + incomediff | incomediff + salestaxdiff + cigtaxdiff) cig_iv_OR <- lm(residuals(cig_ivreg_diff3) ~ incomediff + salestaxdiff + cigtaxdiff) cig_OR_test <- linearHypothesis(cig_iv_OR, c("salestaxdiff = 0", "cigtaxdiff = 0"), test = "Chisq") pchisq(cig_OR_test[2, 5], df = 1, lower.tail = FALSE) [1] 0.02783843 

To illustrate that the problem is not specific to your data set, let's treat incomediff as an endogenous regressor as well.

cig_ivreg_diff4 <- ivreg(packsdiff ~ pricediff + incomediff | salestaxdiff + cigtaxdiff) 

Now we have 2 instruments for 2 endogenous regressors, which means that we will get the same strange results that you do

cig_iv_OR4 <- lm(residuals(cig_ivreg_diff4) ~ salestaxdiff + cigtaxdiff) round(summary(cig_iv_OR4)$coef,4) Estimate Std. Error t value Pr(>|t|) (Intercept) 0 0.0465 0 1 salestaxdiff 0 0.0155 0 1 cigtaxdiff 0 0.0053 0 1 

It is only possible to do an over-identification test when the model is over identified. If you use a package for it (instead of constructing the the test yourself) you will see that it is omitted from the diagnostics (here, it is called Sargan):

round(summary(cig_ivreg_diff3, diagnostics = T)$diagnostics,4) df1 df2 statistic p-value Weak instruments 2 44 75.6526 0.0000 Wu-Hausman 1 44 3.5015 0.0680 Sargan 1 NA 4.8380 0.0278 round(summary(cig_ivreg_diff4, diagnostics = T)$diagnostics,4) df1 df2 statistic p-value Weak instruments (pricediff) 2 45 78.7058 0.0000 Weak instruments (incomediff) 2 45 0.7848 0.4624 Wu-Hausman 2 43 4.6134 0.0153 Sargan 0 NA NA NA 

If you want to understand the problem at a deeper level I recommend beginning with why residuals are uncorrelated with the regressors that prodcued them.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.