Separating interaction and main effects in generalized additive models

Question

I have previously asked about factor-smooth interactions for generalized additive models and received helpful answers: Smooth factor-interactions in generalized additive models

More generally I am interested the following type of model structure involving both smooth-factor and smooth-smooth interactions:

\begin{align} \text{log}(\mu) &= \underbrace{A_{0}}_{\text{intercept}} + \underbrace{f(x,y)}_{\text{main effect spatial smooth}} + \underbrace{\sum_{i}g_{i}(z_{i})}_{\text{main effects other smooths}} + \underbrace{\sum_{j} A_{j}}_{\text{main effects factors}} + \\ &\underbrace{\sum_{i}j_{i}(x,y,z_{i})}_{\text{smooth-smooth interaction}} + \underbrace{\sum_{j} h_{j}(x,y,A_{j})}_{\text{smooth-factor interaction}}, \end{align} i.e. spatial model at its core, which however also include the smooth and parametric effects of other covariates, and where these are allowed to interact with the spatial part.

My main question is: Is it possible to set up gam-models in mgcv, which with the correct set of identifiability constraints can meaningfuly isolate the different main and interaction effects in models of the type specified above?

Toy dataset To explore my question a bit further consider a specific example of (1), where there is only one other continous and categorial covariate: \begin{align*} &\text{Intercept}: A_{0} = 10 \\ \\ &\text{Main effect spatial smooth}: f(x,y) = \text{sin}(x)\text{cos}(y) \\ \\ &\text{Main effect other smooth}: g(z) = \text{exp}(-z/100)\cdot z/100 \\ \\ &\text{Main effect other factor}: A = \begin{cases} 0 & \text{ if } A== a0 \\ 1 & \text{ if } A == a1 \\ 2 & \text{ if } A == a2 \end{cases} \\ \\ &\text{Smooth-factor interaction}: h(x,y,A) = \begin{cases} 0 & \text{ if } A== a0 \\ \text{sin}(x)^2y & \text{ if } A == a1 \\ (xy)^{1/4} & \text{ if } A == a2 \end{cases} \\ &\text{Smooth-smooth interaction}: j(x,y,z) = (xy\cdot \text{exp}(-z/100))^{1/2} \end{align*}

To ensure identifiability I have chosen the first factor level $a0$ as reference. My question is then whether this along with apropriate boundary conditions for the continuous smooths in mgcv (effects should be zero at $x=0,y=0,z=0$) is enough to reproduce the following dataset generated with the specifications above:

N <- 11 data <- expand.grid(x = seq(0,1, length.out = N), y = seq(0,1, length.out = N), z = seq(0,1000, length.out = N), A = c("a0","a1","a2")) intercept <- 10 value1 <- 1 value2 <- 2 eps <- 0.0 data <- data %>% mutate(intercept = intercept, main_effect1 = sin(x)*cos(y), main_effect2 = case_when(A == "a0" ~ 0, A == "a1" ~ value1, A == "a2" ~ value2), main_effect3 = exp(-z/100)*z/100, interaction1 = case_when(A == "a0" ~ 0, A == "a1" ~ sin(x)**2*y, A == "a2" ~ (x*y)**(1/4)), interaction2 = (x*y*exp(-z/100))**(1/2), epsilon = rnorm(nrow(data), sd = eps), response_simulated = exp(intercept + main_effect1 + main_effect2 + main_effect3 + interaction1 + interaction2 + eps), type = "simulated") data$A <- as.factor(data$A)

Example without smooth-smooth interaction Let us first consider the case, where $j(x,y,z)=0$. In this case I use the following model-specification:

model <- mgcv::gam(response_simulated ~ s(x,y, pc = c(0,0), k = N*N-1) + s(z, pc = 0, k = N-1) + A + s(x, y, by = ordered(A), pc = c(0,0), k = N-1), data = data, family = gaussian(link = log), method = "REML")

, which term by term produces the following plots:

predictions <- as.data.frame(predict.gam(model, data, type = "terms")) %>% rename(main_effect1 = `s(x,y)`, main_effect2 = A, main_effect3 = `s(z)`, interaction_a1 = `s(x,y):ordered(A)a1`, interaction_a2 = `s(x,y):ordered(A)a2`) %>% mutate(intercept = model$coefficients[[1]], interaction1 = interaction_a1 + interaction_a2, x = data$x, y = data$y, z = data$z, A = data$A, type = "fitted") data_with_predictions <- bind_rows(data, predictions) #plot intercept plot1 <- ggplot(data_with_predictions, aes(x = x, y = intercept, color = type)) + ylim(0, 15) + geom_line() #plot first main effect plot2 <- ggplot(data, aes(x, y, fill = main_effect1)) + geom_tile() + ggtitle("Simulation main effect 1") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot3 <- ggplot(predictions, aes(x, y, fill = main_effect1)) + geom_tile() + ggtitle("Fitted main effect 1") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() #plot second main effect plot4 <- ggplot(data_with_predictions, aes(x = A, y = main_effect2, color = type)) + geom_point() #plot third main effect plot5 <- ggplot(data_with_predictions, aes(x = z, y = main_effect3, color = type)) + geom_line() #plot factor-smooth interaction for A="a1" data_a1 <- data %>% filter(A == "a1") predictions_a1 <- predictions %>% filter(A == "a1") plot6 <- ggplot(data_a1, aes(x, y, fill = interaction1)) + geom_tile() + ggtitle("Simulation interaction 1 a1") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot7 <- ggplot(predictions_a1, aes(x, y, fill = interaction1)) + geom_tile() + ggtitle("Fitted interaction 1 a1") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() #plot factor-smooth interaction for A="a2" data_a2 <- data %>% filter(A == "a2") predictions_a2 <- predictions %>% filter(A == "a2") plot8 <- ggplot(data_a2, aes(x, y, fill = interaction1)) + geom_tile() + ggtitle("Simulation interaction 1 a2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot9 <- ggplot(predictions_a2, aes(x, y, fill = interaction1)) + geom_tile() + ggtitle("Fitted interaction 1 a2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() ggpubr::ggarrange(plot1, plot4, plot5, plot2, plot6, plot8, plot3, plot7, plot9)

Which almost perfectly reproduces the different terms in (1). I suspect the small deviations are due my model bordering on being unidentifiable. To ensure convergence I also had to use slightly less basis functions than the number of data points (i.e. k=N-1 in the model specification).

Example without smooth-smooth interaction Consider now the case, where the smooth-smooth interaction is turned on. As I want to separate interaction and main effects I use the ti()-specification for the smooth-smooth interaction. I furthermore use d=c(2,1) to exploit isotropy in $(x,y)$ but not with z, which is in another unit. This had led me to the following model specification:

model <- mgcv::gam(response_simulated ~ s(x,y, pc = c(0,0), k = N*N-1) + s(z, pc = 0, k = N-1) + A + s(x, y, by = ordered(A), pc = c(0,0), k = N*N-1) + ti(x, y, z, d = c(2,1), k = N-1), data = data, family = gaussian(link = log), method = "REML")

I suspect a more correct approach would be to set pc = c(0,0,0) in the ti(), but this is not possible due to a bug, which I have described in a previous post: Point constraints for mgcv smooths

Using the above model I get the following:

predictions <- as.data.frame(predict.gam(model, data, type = "terms")) %>% rename(main_effect1 = `s(x,y)`, main_effect2 = A, main_effect3 = `s(z)`, interaction_a1 = `s(x,y):ordered(A)a1`, interaction_a2 = `s(x,y):ordered(A)a2`, interaction2 = `ti(x,y,z)`) %>% mutate(intercept = model$coefficients[[1]], interaction1 = interaction_a1 + interaction_a2, x = data$x, y = data$y, z = data$z, A = data$A, type = "fitted") data_with_predictions <- bind_rows(data, predictions) #plot intercept plot1 <- ggplot(data_with_predictions, aes(x = x, y = intercept, color = type)) + ylim(0, 15) + geom_line() #plot first main effect plot2 <- ggplot(data, aes(x, y, fill = main_effect1)) + geom_tile() + ggtitle("Simulation main effect 1") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot3 <- ggplot(predictions, aes(x, y, fill = main_effect1)) + geom_tile() + ggtitle("Fitted main effect 1") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() #plot second main effect plot4 <- ggplot(data_with_predictions, aes(x = A, y = main_effect2, color = type)) + geom_point() #plot third main effect plot5 <- ggplot(data_with_predictions, aes(x = z, y = main_effect3, color = type)) + geom_line() #plot factor-smooth interaction for A="a1" data_a1 <- data %>% filter(A == "a1") predictions_a1 <- predictions %>% filter(A == "a1") plot6 <- ggplot(data_a1, aes(x, y, fill = interaction1)) + geom_tile() + ggtitle("Simulation interaction 1 a1") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot7 <- ggplot(predictions_a1, aes(x, y, fill = interaction1)) + geom_tile() + ggtitle("Fitted interaction 1 a1") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() #plot factor-smooth interaction for A="a2" data_a2 <- data %>% filter(A == "a2") predictions_a2 <- predictions %>% filter(A == "a2") plot8 <- ggplot(data_a2, aes(x, y, fill = interaction1)) + geom_tile() + ggtitle("Simulation interaction 1 a2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot9 <- ggplot(predictions_a2, aes(x, y, fill = interaction1)) + geom_tile() + ggtitle("Fitted interaction 1 a2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() #plot smooth-smooth interaction for different values of z data_z1 <- data %>% filter(z == min(z)) predictions_z1 <- predictions %>% filter(z == min(z)) z2 <- unique(data$z)[round(N/2)] #some number roughly in the middle of the interval data_z2 <- predictions %>% filter(z == z2) predictions_z2 <- predictions %>% filter(z == z2) data_z3 <- data %>% filter(z == max(z)) predictions_z3 <- predictions %>% filter(z == max(z)) plot10 <- ggplot(data_z1, aes(x, y, fill = interaction2)) + geom_tile() + ggtitle("Simulation interaction 2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot11 <- ggplot(predictions_z1, aes(x, y, fill = interaction2)) + geom_tile() + ggtitle("Fitted interaction 2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot12 <- ggplot(data_z2, aes(x, y, fill = interaction2)) + geom_tile() + ggtitle("Simulation interaction 2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot13 <- ggplot(predictions_z2, aes(x, y, fill = interaction2)) + geom_tile() + ggtitle("Fitted interaction 2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot14 <- ggplot(data_z3, aes(x, y, fill = interaction2)) + geom_tile() + ggtitle("Simulation interaction 2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() plot15 <- ggplot(predictions_z3, aes(x, y, fill = interaction2)) + geom_tile() + ggtitle("Fitted main effect 2") + scico::scale_fill_scico(palette = "vik", na.value = "white") + theme_void() ggpubr::ggarrange(plot1, plot4, plot5, plot2, plot6, plot8, plot3, plot7, plot9) ggpubr::ggarrange(plot10, plot12, plot14, plot11, plot13, plot15)

Which also looks okay, but not as good as in the previous example. It seems like different terms now start to "mix".

Summary and questions So in conclusion and returning to the main question, the above analysis went some way in succesfully disentangling the different main and interaction effects for a very simple toy example. The separation was however not perfect, which leads me to the following question:

For a more realistic dataset with a large number of other smooth covariates and factor variables can one expect to be able meaningfully separate main and interaction terms? Based on my toy example I could imagine that adding succesively more and more covariates would end up giving problems as I already saw in the example of adding the smooth-smooth interaction.
The above conclusions of course assume that my model specifications are correct. Are they? And if so, are there better ways to achieve the separation I want? I am in particular interested in whether the use of the ti()-specification to separate out the interaction part is correct. I am also interested in whether the use of other basis for the smooths would achieve better results.

FWIW, a related question was answered here: stats.stackexchange.com/a/660074/370574 The help page in that link explains how to use ti to separate the smooth marginal and interaction effects between 2 continuous features. I'm not sure how to best separate the marginal and interaction effects between a categorical and smooth continuous effect, but there are a few options here — jarbet
– jarbet, Commented Jan 23 at 17:07
August, can you explain why you want the effects on the log scale to be 0 when x, y, and z are all equal to 0. — Gavin Simpson
– Gavin Simpson, Commented Jan 31 at 10:09
I'm not sure that it's possible to "meaningfully isolate the different main and interaction effects" in any type of model with interactions. An interaction means that the association of one predictor with outcome depends on the value of an interacting predictor. Thus the "main effect" of a predictor ends up depending on how its interacting predictors are coded. See this page. It would help if you could clarify what you mean by "isolating" main and interacting effects in this case. — EdM
– EdM, Commented Feb 1 at 17:13

Stack Exchange Network

Separating interaction and main effects in generalized additive models

0

Linked

Hot Network Questions

Separating interaction and main effects in generalized additive models

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked

Related

Hot Network Questions