3
$\begingroup$

I ran a HC (‘robust’) regression. The intercept is significant, which is reflected in the confidence intervals around the unstandardized betas. However, the CIs around the standardized β are quite wide and include 0; please see the table below. robust regression Is that fine, or should I be concerned and am I doing something wrong? The dataset is available at here; all the steps of the coding are included below. (The table is patched together as I haven’t yet figured out how to generate the complete one seamlessly.) Many thanks!

> load(file = "Amman.rda") > library(sandwich) > model <- lm(progress ~ opi + competence + integration + indegree + voterank, data = Amman) > summary(model) Call: lm(formula = progress ~ opi + competence + integration + indegree + voterank, data = Amman) Residuals: Min 1Q Median 3Q Max -0.150864 -0.028803 0.003795 0.026640 0.124062 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.496290 0.078140 6.351 2.69e-06 *** opi -0.424556 0.153329 -2.769 0.0115 * competence 0.010045 0.004462 2.252 0.0352 * integration -0.238163 0.099404 -2.396 0.0260 * indegree 0.023413 0.013545 1.729 0.0986 . voterank -0.002266 0.003058 -0.741 0.4669 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.06614 on 21 degrees of freedom (10 observations deleted due to missingness) Multiple R-squared: 0.5663, Adjusted R-squared: 0.4631 F-statistic: 5.485 on 5 and 21 DF, p-value: 0.002205 > library(lmtest) > coeffHC4 <- coeftest(model, vcov = vcovHC(model, type = "HC4")) > coeffHC4 t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4962902 0.0808139 6.1411 4.299e-06 *** opi -0.4245559 0.1508714 -2.8140 0.010397 * competence 0.0100454 0.0044635 2.2506 0.035257 * integration -0.2381628 0.0805296 -2.9575 0.007517 ** indegree 0.0234134 0.0098380 2.3799 0.026873 * voterank -0.0022659 0.0023894 -0.9483 0.353756 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > conf_ints <- confint (coeffHC4, level = 0.95) |> as.data.frame () |> tibble::rownames_to_column("Variables") |> `colnames<-`(c("Variables", "95%CI_low", "95%CI_hi")) > conf_ints Variables 95%CI_low 95%CI_hi 1 (Intercept) 0.3282284541 0.664351977 2 opi -0.7383101090 -0.110801602 3 competence 0.0007629805 0.019327796 4 integration -0.4056331442 -0.070692390 5 indegree 0.0029541503 0.043872647 6 voterank -0.0072348011 0.002703083 > library(dplyr) > VIF_tol = vif(model) |> as.data.frame () |> tibble::rownames_to_column("Variables") |> mutate (Tolerance = 1/`vif(model)`) |> `colnames<-`(c("Variables", "VIF", "Tolerance")) > VIF_tol Variables VIF Tolerance 1 scale(opi) 1.493807 0.6694305 2 scale(competence) 1.175722 0.8505409 3 scale(integration) 1.392198 0.7182888 4 scale(indegree) 1.050531 0.9518994 5 scale(voterank) 1.414064 0.7071815 > model_summary <- summary(model) > output <- model_summary$coefficients |> as.data.frame () |> tibble::rownames_to_column("Variables") |> left_join (conf_ints, by = "Variables") |> left_join (VIF_tol, by = "Variables") |> mutate (across(c(2:4, 6:9), .fns = function(x) {format(round(x, 5), nsmall = 5)})) |> relocate (`95%CI_low`, .after = Estimate) |> relocate (`95%CI_hi`, .after = `95%CI_low`) > output[ ,7] <- format.pval(output[ ,7], eps = .001, digits = 4) > output Variables Estimate 95%CI_low 95%CI_hi Std. Error t value Pr(>|t|) VIF Tolerance 1 (Intercept) 0.49629 0.32823 0.66435 0.07814 6.35133 < 0.001 NA NA 2 opi -0.42456 -0.73831 -0.11080 0.15333 -2.76892 0.01150 1.49381 0.66943 3 competence 0.01005 0.00076 0.01933 0.00446 2.25151 0.03519 1.17572 0.85054 4 integration -0.23816 -0.40563 -0.07069 0.09940 -2.39592 0.02597 1.39220 0.71829 5 indegree 0.02341 0.00295 0.04387 0.01354 1.72862 0.09855 1.05053 0.95190 6 voterank -0.00227 -0.00723 0.00270 0.00306 -0.74103 0.46688 1.41406 0.70718 > library(lsr) > etaSquared(model) eta.sq eta.sq.part opi 0.15832698 0.26744774 competence 0.10468472 0.19445477 integration 0.11854396 0.21467219 indegree 0.06170723 0.12456734 voterank 0.01133971 0.02548222 > table <- nice_table(output) > flextable::save_as_docx(table, path = "table.docx") #For standardized betas: > model_std <- lm(scale(progress) ~ scale(opi) + scale(competence) + scale(integration) + scale(indegree) + scale(voterank), data = Amman) > coeffHC4_std <- coeftest(model_std, vcov = vcovHC(model_std, type = "HC4")) > conf_ints <- confint (coeffHC4_std, level = 0.95) |> as.data.frame () |> tibble::rownames_to_column("Variables") |> `colnames<-`(c("Variables", "95%CI_low", "95%CI_hi")) > model_std_summary <- summary(model_std) output_std <- model_std_summary$coefficients |> as.data.frame () |> tibble::rownames_to_column("Variables") |> left_join (conf_ints, by = "Variables") |> mutate (across(c(2:4, 6:7), .fns = function(x) {format(round(x, 5), nsmall = 5)})) |> relocate (`95%CI_low`, .after = Estimate) |> relocate (`95%CI_hi`, .after = `95%CI_low`) > output_std[ ,7] <- format.pval(output_std[ ,7], eps = .001, digits = 4) > output_std Variables Estimate 95%CI_low 95%CI_hi Std. Error t value Pr(>|t|) 1 (Intercept) 0.17556 -0.14697 0.49809 0.15728 1.11623 0.27693 2 scale(opi) -0.47667 -0.82894 -0.12440 0.17215 -2.76892 0.01150 3 scale(competence) 0.38072 0.02892 0.73252 0.16909 2.25151 0.03519 4 scale(integration) -0.49318 -0.83997 -0.14639 0.20584 -2.39592 0.02597 5 scale(indegree) 0.26905 0.03395 0.50415 0.15564 1.72862 0.09855 6 scale(voterank) -0.13614 -0.43469 0.16241 0.18372 -0.74103 0.46688 
$\endgroup$
2
  • 3
    $\begingroup$ Recall what the intercept does: giving you an estimate for $y$ when all regressors are zero. After scaling, this is now tantamount to asking what happens when all regressors are at their respective mean values. Hence, the intercept now has a very different interpretation, and it is not clear why there should be a clear relationship between its value before and after standardization, and hence nor its significance. As a consequence I also do not think this has anything to do with HC s.e.s. $\endgroup$ Commented Jun 19, 2024 at 13:57
  • $\begingroup$ Thank you, @Christoph-Hanck! $\endgroup$ Commented Jul 17, 2024 at 8:37

1 Answer 1

10
$\begingroup$

This is unsurprising:

  • The intercept $p$-value tests whether the outcome differs significantly from $0$ when all explanatory variables equal $0$.
  • Using scale on both the explanatory variables and the outcome forces the intercept to be zero.

You can demonstrate this with a simulation in R:

require("sfsmisc") set.seed(1234) n <- 100 x <- rnorm(n, 2) y <- 1.5 + 0.5 * x + rnorm(n) LM <- lm(y ~ x) LMs <- lm(scale(y) ~ scale(x)) summary(LM)$coefficients # Estimate Std. Error t value Pr(>|t|) # (Intercept) 1.5893240 0.2175899 7.304218 7.493838e-11 # x 0.4739151 0.1037759 4.566715 1.441777e-05 summary(LMs)$coefficients # Estimate Std. Error t value Pr(>|t|) # (Intercept) 1.904569e-17 0.09126601 2.086833e-16 1.000000e+00 # scale(x) 4.188856e-01 0.09172579 4.566715e+00 1.441777e-05 

And here is what the scatterplot looks like:

enter image description here

$\endgroup$
2
  • 1
    $\begingroup$ It is surprising that the intercept is not zero. That could be due to the 10 observations with NA values. $\endgroup$ Commented Jun 21, 2024 at 13:11
  • $\begingroup$ Thank you so much, @Frans-Rodenburg! $\endgroup$ Commented Jul 17, 2024 at 8:35

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.