5
$\begingroup$

I constructed a generalized additive mixed effect model (GAMM) that converged, but has very large intervals for some terms and I'm not sure if it's an issue. Practically all diagnostic tools that I've used say the model is fine (Pearson correlation between predictor variables is < 0.47, no zero-inflation, no over-dispersion, qq-plot lines up well, no glaring issues in summary output, and seemingly no strong spatial or temporal auto-correlation), with the exception of mgcv::gam.vcomp().

Might this be an indication of overfitting, or is gam.vcomp() simply having trouble estimating the interval of something with a small effect; i.e. nothing to worry about(?) Does anything need to be done? For example, removing the main year term could make sense if I thought there wasn't a global annual trend that all sites/seasons share, but it makes the intervals worse. As I understand it, I also can't remove the repeated measure term fSite (location), because I need to represent the mean fSite effect (especially when I have it's interaction term: s(fSite, CYR, bs = "re")). Random slopes without random intercepts don't make much sense in this case (i.e. "all sites start with the same abundance, but develop their own annual trend?" Nope).

I can post the results of the other diagnostics if needed, but I thought it would make my question too long.

library(mgcv) # fit model mod <- gam(num ~ # Abundance (count data with many zeros) # Annual trend + seasonal deviations s(CYR) + s(CYR, fSeason, bs = "sz") + # Time of day (8am - 6pm, so can't do 'cc') s(ToD) + # Habitat covariates s(ave_tt) + s(sed_depth) + # Water properties s(DO) + s(sal) + # Site effects s(fSite, bs = "re") + # Some sites have higher abundance than others s(fSite, CYR, bs = "re") + # Each site has an annual trend # (linear trend b/c fs term is too complex) # High seasonal variation s(fSeason, fCYR, bs = "re") + # Large year-to-year variation each season offset(log(area_sampled)), # Area surveyed at each site data = toad2, method = 'REML', select = FALSE, gamma = 1.4, # 1.4 can be a sensible choice for suppressing over-fitting - S. Wood family = nb(link = "log"), control = list(trace = TRUE), drop.unused.levels=FALSE) 1 penalized deviance = 844.9452 2 penalized deviance = 822.1983 3 penalized deviance = 821.985 4 penalized deviance = 821.9849 5 penalized deviance = 821.9849 1 newton max(|grad|) = 22.95153 1 penalized deviance = 795.123 2 penalized deviance = 795.1088 3 penalized deviance = 795.1088 4 penalized deviance = 795.1088 1 penalized deviance = 607.0413 2 penalized deviance = 607.0378 ... 1 penalized deviance = 615.4617 2 penalized deviance = 615.4617 16 newton max(|grad|) = 3.235313e-05 1 penalized deviance = 615.4617 2 penalized deviance = 615.4617 # -- converged, no warnings/errors -- > with(toad2, nlevels(interaction(CYR, fSite, drop = FALSE))) [1] 799 > with(toad2, nlevels(interaction(CYR, fSeason, drop = FALSE))) [1] 34 > gratia::model_edf(mod) # A tibble: 1 × 2 .model .edf <chr> <dbl> 1 mod 53.6 > summary(mod) Family: Negative Binomial(0.407) Link function: log Formula: num ~ s(CYR) + s(CYR, fSeason, bs = "sz") + s(ToD) + s(ave_tt) + s(sed_depth) + s(DO) + s(sal) + s(fSite, bs = "re") + s(fSite, CYR, bs = "re") + s(fSeason, fCYR, bs = "re") + offset(log(area_sampled)) Parametric coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.083 0.159 -19.39 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Approximate significance of smooth terms: edf Ref.df Chi.sq p-value s(CYR) 1.00013 1.000 0.716 0.397419 s(CYR,fSeason) 2.62538 2.793 16.304 0.000857 *** s(ToD) 2.14975 2.733 8.428 0.043768 * s(ave_tt) 3.09916 3.854 43.784 < 2e-16 *** s(sed_depth) 1.43453 1.732 3.265 0.101876 s(DO) 2.08725 2.675 6.367 0.068145 . s(sal) 1.00006 1.000 0.832 0.361770 s(fSite) 0.02572 46.000 0.032 0.115478 s(fSite,CYR) 23.52341 46.000 67.241 < 2e-16 *** s(fSeason,fCYR) 15.67032 28.000 51.782 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 R-sq.(adj) = 0.228 Deviance explained = 46.2% -REML = 615.78 Scale est. = 1 n = 1436 > exp(family(mod)$getTheta()) [1] 0.4065145 > # If your response is treated as Poisson then scale parameter estimates > # <<1 are also diagnostic - S. Wood > gam.check(mod) Method: REML Optimizer: outer newton full convergence after 16 iterations. Gradient range [-3.235314e-05,1.234604e-11] (score 615.782 & scale 1). eigenvalue range [-7.666553e-14,30.92861]. Model rank = 193 / 193 Basis dimension (k) checking results. Low p-value (k-index<1) may indicate that k is too low, especially if edf is close to k'. k' edf k-index p-value s(CYR) 9.0000 1.0001 0.77 0.16 s(CYR,fSeason) 10.0000 2.6254 NA NA s(ToD) 9.0000 2.1498 0.78 0.26 s(ave_tt) 9.0000 3.0992 0.77 0.26 s(sed_depth) 9.0000 1.4345 0.78 0.22 s(DO) 9.0000 2.0872 0.79 0.49 s(sal) 9.0000 1.0001 0.81 0.81 s(fSite) 47.0000 0.0257 NA NA s(fSite,CYR) 47.0000 23.5234 NA NA s(fSeason,fCYR) 34.0000 15.6703 NA NA > gam.vcomp(mod) Standard deviations and 0.95 confidence intervals: std.dev lower upper s(CYR) 4.728877e-04 7.060334e-79 3.167312e+71 s(CYR,fSeason)1 5.794462e-02 2.814867e-03 1.192802e+00 s(CYR,fSeason)2 5.794463e-02 2.814869e-03 1.192801e+00 s(ToD) 1.330395e-01 2.516623e-02 7.033038e-01 s(ave_tt) 1.276690e-02 4.396294e-03 3.707525e-02 s(sed_depth) 1.061798e-03 1.397058e-05 8.069917e-02 s(DO) 8.712614e-02 1.575589e-02 4.817858e-01 s(sal) 7.671887e-05 3.421230e-91 1.720371e+82 s(fSite) 1.900063e-02 9.885438e-81 3.652077e+76 s(fSite,CYR) 2.851785e-04 1.753760e-04 4.637279e-04 s(fSeason,fCYR) 5.473322e-01 3.275692e-01 9.145320e-01 Rank: 10/11 > coef(mod)[66:112] s(fSite).1 s(fSite).2 s(fSite).3 s(fSite).4 s(fSite).5 s(fSite).6 s(fSite).7 -2.705664e-04 -6.381515e-04 -2.904600e-04 1.333252e-04 -6.659041e-05 -3.919400e-04 6.266535e-05 s(fSite).8 s(fSite).9 s(fSite).10 s(fSite).11 s(fSite).12 s(fSite).13 s(fSite).14 -4.511340e-04 -5.949274e-04 4.508613e-04 -2.700582e-04 -2.855889e-04 3.455710e-04 -4.429773e-04 s(fSite).15 s(fSite).16 s(fSite).17 s(fSite).18 s(fSite).19 s(fSite).20 s(fSite).21 4.344080e-04 -7.985010e-04 1.620827e-04 9.439015e-04 3.326189e-04 3.592902e-04 -1.659740e-04 s(fSite).22 s(fSite).23 s(fSite).24 s(fSite).25 s(fSite).26 s(fSite).27 s(fSite).28 -3.227267e-04 -3.572010e-04 -3.335956e-04 -3.436682e-04 -5.486051e-04 4.032182e-04 1.504305e-04 s(fSite).29 s(fSite).30 s(fSite).31 s(fSite).32 s(fSite).33 s(fSite).34 s(fSite).35 8.595404e-04 7.776540e-04 -1.133317e-04 -3.456665e-04 -1.513601e-04 -9.634490e-05 -6.275575e-04 s(fSite).36 s(fSite).37 s(fSite).38 s(fSite).39 s(fSite).40 s(fSite).41 s(fSite).42 2.723575e-04 3.347001e-04 -4.285059e-04 -5.483368e-04 5.604324e-04 6.512360e-04 -7.336334e-05 s(fSite).43 s(fSite).44 s(fSite).45 s(fSite).46 s(fSite).47 7.569164e-04 1.389385e-03 2.058185e-04 -8.959650e-04 2.666844e-04 # When main calendar year term s(CYR) is removed: > gam.vcomp(mod2) Standard deviations and 0.95 confidence intervals: std.dev lower upper s(CYR,fSeason)1 7.099331e-02 1.043980e-260 4.827725e+257 s(CYR,fSeason)2 7.099331e-02 1.043978e-260 4.827734e+257 s(ToD) 1.251142e-01 2.215805e-02 7.064507e-01 s(ave_tt) 1.264668e-02 4.356001e-03 3.671681e-02 s(sed_depth) 1.064279e-03 1.481443e-05 7.645849e-02 s(DO) 8.263807e-02 1.395950e-02 4.892045e-01 s(sal) 4.892677e-05 6.244970e-141 3.833211e+131 s(fSite) 1.743680e-02 1.197805e-88 2.538327e+84 s(fSite,CYR) 2.810781e-04 1.726533e-04 4.575928e-04 s(fSeason,fCYR) 5.379412e-01 3.229912e-01 8.959398e-01 Rank: 10/10 # Salinity intervals jumped up by a lot when I removed the main year term. # Perhaps salinity and the annual trends have a non-linear association, but I can't check: > concurvity(mod) Error in forwardsolve(t(Rt), t(R[1:r, , drop = FALSE])) : singular matrix in 'backsolve'. First zero in diagonal [27] # Result of changing select=TRUE > gam.vcomp(mod3) Standard deviations and 0.95 confidence intervals: std.dev lower upper s(CYR)1 0.0002622572 7.888282e-67 8.719118e+58 s(CYR)2 0.0032867115 2.853467e-79 3.785736e+73 s(CYR,fSeason)1 0.0474365937 0.000000e+00 Inf s(CYR,fSeason)2 0.0474365936 0.000000e+00 Inf s(CYR,fSeason)3 0.2081356080 1.626235e-02 2.663848e+00 s(ToD)1 0.0645597518 3.937455e-03 1.058542e+00 s(ToD)2 0.0244684592 7.824664e-33 7.651517e+28 s(ave_tt)1 0.0125788207 4.730636e-03 3.344724e-02 s(ave_tt)2 0.0058716918 1.055836e-76 3.265351e+71 s(sed_depth)1 0.0013703911 3.029538e-04 6.198872e-03 s(sed_depth)2 0.0027306865 1.075626e-67 6.932382e+61 s(DO)1 0.0499174378 6.592166e-03 3.779866e-01 s(DO)2 0.0025812158 1.178202e-65 5.654951e+59 s(sal)1 0.0000592970 1.330736e-85 2.642247e+76 s(sal)2 0.0014764865 2.306175e-60 9.452935e+53 s(fSite) 0.0191994590 1.283382e-85 2.872249e+81 s(fSite,CYR) 0.0002809726 1.724210e-04 4.578654e-04 s(fSeason,fCYR) 0.5402643898 3.260337e-01 8.952620e-01 Rank: 18/18 

UPDATE: A subset of the data yielding similar results

toad3 <- subset(toad2, fSite %in% c(1,10,20,30,40), select = c(CYR, fCYR, fSeason, ToD, fSite, area_sampled, num, DO, sal, sed_depth, ave_tt)) > dput(toad3) structure(list(CYR = c(2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2021L, 2021L, 2021L, 2021L, 2021L, 2022L, 2022L, 2022L, 2022L, 2022L, 2022L, 2022L, 2022L, 2022L, 2023L, 2023L, 2023L, 2023L, 2023L, 2023L, 2023L, 2023L, 2023L, 2023L, 2024L, 2024L, 2024L, 2024L, 2024L), fCYR = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 17L), levels = c("2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021", "2022", "2023", "2024"), class = "factor"), fSeason = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), levels = c("DRY", "WET"), class = "factor"), ToD = c(12.0666666666667, 13.5166666666667, 13.1, 13.75, 15.8833333333333, 8.45, 9.06666666666667, 10.8666666666667, 13.1166666666667, 10.1333333333333, 9.51666666666667, 11.55, 10.5666666666667, 11.9, 10.9833333333333, 10.0166666666667, 7.81666666666667, 8.55, 9.35, 11.8166666666667, 10.55, 12.6166666666667, 13.55, 9.55, 10.7, 11.8166666666667, 8.55, 11.3333333333333, 11.6166666666667, 11.2666666666667, 10.5, 11.9833333333333, 11.0333333333333, 10.3833333333333, 9.45, 13.15, 9.25, 11.4166666666667, 11.2166666666667, 10.3, 9.33333333333333, 9.03333333333333, 12.4833333333333, 13.95, 8.88333333333333, 12.5333333333333, 15.4833333333333, 12.4833333333333, 11.5833333333333, 12.6666666666667, 12.1833333333333, 13.1666666666667, 9.98333333333333, 13.5166666666667, 12.1333333333333, 12.25, 12.55, 14.1666666666667, 12.8333333333333, 13.45, 14.05, 11.45, 12.8833333333333, 10.3333333333333, 8.78333333333333, 9.76666666666667, 8.8, 10.1333333333333, 10.5, 12.0833333333333, 12.75, 11.6166666666667, 13.0833333333333, 9.05, 14.0666666666667, 13.8, 10.8833333333333, 14.45, 9.81666666666667, 11.6666666666667, 12.1, 14.45, 10.2666666666667, 8.36666666666667, 10.7333333333333, 11.7, 16.4333333333333, 11.1833333333333, 10.65, 14.35, 14.6666666666667, 7.03333333333333, 13.4, 13.5333333333333, 14, 9.78333333333333, 12.1833333333333, 11.5, 12.4166666666667, 13.3666666666667, 13.5166666666667, 14.6166666666667, 10.6, 11.1833333333333, 10.3833333333333, 12.8333333333333, 13.4, 11.6666666666667, 11.0666666666667, 11.0333333333333, 12.6, 8.51666666666667, 9.43333333333333, 11.7, 13.25, 13.1166666666667, 13.8666666666667, 15.7833333333333, 15.4166666666667, 8.68333333333333, 8.4, 7.71666666666667, 8.53333333333333, 14.0666666666667, 11.8833333333333, 9.26666666666667, 10.0333333333333, 11.1666666666667, 12.3, 9.3, 7.66666666666667, 8.2, 9.73333333333333, 9, 8.8, 8.15, 14.65, 7.76666666666667, 8.53333333333333, 8.9, 7.6, 9.18333333333333, 13.7666666666667, 7.76666666666667, 9.68333333333333, 9.2, 10.55, 8.7, 14.25, 10.7833333333333, 8.73333333333333, 9.68333333333333, 7.36666666666667, 9.01666666666667), fSite = structure(c(10L, 1L, 20L, 40L, 20L, 30L, 10L, 1L, 40L, 30L, 20L, 10L, 1L, 10L, 1L, 20L, 30L, 40L, 40L, 30L, 20L, 1L, 1L, 10L, 40L, 30L, 20L, 30L, 20L, 10L, 1L, 40L, 40L, 30L, 20L, 1L, 10L, 40L, 30L, 20L, 1L, 10L, 40L, 20L, 30L, 10L, 1L, 40L, 30L, 20L, 10L, 1L, 40L, 30L, 20L, 10L, 1L, 30L, 20L, 10L, 1L, 40L, 30L, 20L, 1L, 10L, 40L, 30L, 20L, 10L, 1L, 40L, 20L, 30L, 10L, 1L, 40L, 30L, 20L, 10L, 1L, 40L, 30L, 20L, 10L, 1L, 1L, 10L, 40L, 30L, 20L, 40L, 20L, 10L, 1L, 40L, 30L, 20L, 10L, 1L, 10L, 1L, 40L, 30L, 20L, 10L, 1L, 40L, 30L, 20L, 40L, 30L, 1L, 20L, 10L, 40L, 30L, 20L, 10L, 1L, 20L, 1L, 10L, 40L, 30L, 20L, 30L, 40L, 10L, 1L, 40L, 1L, 10L, 20L, 30L, 1L, 10L, 40L, 30L, 10L, 40L, 20L, 30L, 1L, 40L, 20L, 30L, 1L, 10L, 30L, 1L, 10L, 40L, 20L), levels = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47"), class = "factor"), area_sampled = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), num = c(0L, 0L, 0L, 0L, 1L, 0L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 6L, 1L, 0L, 0L, 0L, 0L, 5L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 6L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 4L, 0L, 2L, 0L, 1L, 0L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 2L, 0L, 3L, 0L, 3L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 4L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 5L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 4L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 0L), DO = c(6.18, 5.9, 10.35, 9.45, 5.55, 5.99, 3.81, 5.57, 7.99727226048632, 9.87583535415623, 12.2326512336252, 10.4481721323013, 10.6675825464615, 6.74, 5.6, 4.19, 11.02, 8.46, 15.41, 8.13, 7.9, 6.4, 5.71, 4.84, 3.01, 3.32, 5.89, 7.02, 5.49, 3.87, 3.62, 6.63, 5.59, 3.46, 3.37, 5.4, 2.39, 8.96, 12.01, 7.07, 5.27, 4.57, 6.11, 12.64, 5.69, 7.78, 9.17, 8.31, 7.09, 7.52, 6.58, 6.74, 4.81, 4.96, 5.97, 4.89, 5.81, 7.77, 9.3, 6.01, 6.38, 7.37, 3.6, 3.36, 2.35, 2.93, 2.4, 4.09, 5.64, 3.92, 7.59, 8.33, 7.32, 3.52, 3.58, 5.63, 8.14, 11.72, 7.16, 5.14, 7.85, 8.2, 3.43, 4.43, 3.31, 4.14, 6.3, 7.2, 5.9, 5.5, 7.1, 2.56, 8.05, 4.43, 7.14, 5.3, 7.2, 8.1, 5.3, 7.7, 2.95, 2.28, 8.63, 5.76, 4.7, 8.5, 7.5, 8.2, 7.4, 5.3, 7.9, 4.1, 3.7, 3.6, 5.5, 7.4, 8, 8.6, 7.7, 6.5, 3.92, 3.63, 2.45, 6.24, 5.76, 2.64, 4.11, 4.83, 3.95, 3.42, 5.04, 6.51, 3.02, 6.03, 4.45, 1.63, 2.71, 3.19, 1.12, 3.47, 2.21, 4.75, 4.93, 4.15, 3.37, 3.73, 3.22, 2.96, 4.23, 6.4, 6.36, 6.1, 4.55, 6.5), sal = c(30.7, 30.46, 24.15, 25.96, 23.85, 27.34, 20.69, 29.42, 27.66, 19.87, 16.84, 33.3, 34.06, 21.67, 25.64, 11.65, 21.75, 31.68, 23.51, 24.19, 21.7, 29.85, 26.02, 18.83, 32.54, 26.79, 12.9, 27.23, 24.68, 25.55, 29.85, 29.86, 36.48, 31.05, 29.66, 38.57, 25.7, 22.52, 20.87, 22.08, 29.45, 28.95, 19.76, 8.09, 14.01, 13.9, 22.08, 31.3, 27.34, 26.86, 31.83, 32.34, 22.83, 14.93, 2.68, 22.1, 25.25, 23.37, 12.57, 29.62, 30.59, 37.1, 28.77, 24.76, 38.87, 28.13, 31.54, 27.41, 24.39, 30.12, 32.25, 29.9, 13.17, 19.06, 20.96, 28.35, 17.71, 17.7, 12.71, 22.2, 25.18, 14.17, 12.25, 5.69, 13.26, 17.37, 15.18, 11.29, 9.67, 8.38, 2.62, 19.82, 22.74, 32.44, 33.2, 32.13, 15.33, 16.56, 29.13, 30.88, 14.69, 22.7, 3.96, 3.76, 5.26, 30.98, 32.29, 31.07, 25.94, 26.33, 30.54, 19.22, 27.35, 24.52, 27.32, 22.83, 21.24, 25.31, 31.74, 32.69, 11.63, 24.37, 19.09, 35.95, 25.05, 22.66, 21.21, 24.28, 23.01, 27.76, 22.98, 28.37, 25.47, 22.52, 21.53, 32.5, 30.26, 36.52, 32.35, 29.81, 33.79, 25.78, 28.81, 31.35, 23.19, 19.68, 19.82, 24.05, 20.38, 18.54, 23.74, 19.25, 20.49, 14.96), sed_depth = c(23, 90, 32, 41, 20, 42, 54, 53, 55, 28, 2, 27, 65, 15, 55, 15, 25, 67, 80, 32, 23, 75, 71, 51, 38, 36, 28, 37, 51, 68, 53, 48, 76, 42, 28, 104, 34, 55, 34, 37, 78, 8, 65, 37, 46, 52, 127, 54, 46, 62, 33, 96, 48, 61, 36, 15, 89, 63, 78, 58, 73, 71, 53, 34, 72, 25, 60, 36, 15, 35, 60, 75, 20, 15, 7, 55, 73, 35, 27, 40, 94, 65, 40, 20, 25, 65, 59, 47, 88, 33, 31, 55, 20, 17, 60, 46, 26, 18, 29, 74, 21, 65, 71, 57, 30, 20, 56, 55, 33, 30, 60, 20, 60, 15, 47, 46, 28, 15, 5, 55, 20, 45, 6, 48, 23, 26, 35, 37, 2, 41, 45, 45, 22, 14, 29, 52, 33, 58, 17, 8, 48, 30, 28, 43, 50, 36, 9, 49, 11, 20, 32, 9, 43, 17), ave_tt = c(2.32, 0, 0, 19.61, 0, 7.1, 11.3, 0, 28, 3.805, 0, 3.72, 0.1, 17.5, 0.72, 0.02, 10, 18, 13, 11.6666666666667, 0, 3.2, 30.5, 22.1, 42, 0, 0, 0.5, 1.5, 6.2, 28.5, 10.6, 15.1, 1.1, 0, 1.1, 11.5, 22.5, 6.3, 0.5, 30.5, 32.5, 23.5, 0.1, 1.1, 19, 1.1, 3.2, 0.5, 0, 4.1, 0, 55, 0.6, 0, 12, 0, 16.5, 0, 12.1, 6.6, 26.5, 24, 0, 9.5, 12.5, 17.7777777777778, 34, 0, 24, 15.5, 40, 0, 14.7, 33, 42.5, 25, 19, 0.1, 15.1, 57.5, 36, 20.5, 0.2, 19.5, 35.5, 9.4, 45, 16.1, 2, 0, 27.5, 0, 15, 4.1, 24.5, 25.5, 0, 34, 12.5, 16.6, 4.6, 41, 5, 0, 16.5, 2.6, 43, 34.5, 0, 34.5, 33.5, 0.7, 0, 42.5, 30, 18.5, 0, 32.1, 15.6, 0, 3.8, 33.5, 18, 49, 0, 24.1, 16, 50, 8.6, 31.5, 1.2, 19.5, 0, 11.6, 3, 29.1, 3, 19, 29.5, 19, 0.5, 21.5, 17.5, 16.6, 0, 27.1, 17.6, 60.5, 53.1, 5.5, 23.7, 3.7, 0)), row.names = c(1L, 10L, 20L, 22L, 31L, 41L, 42L, 52L, 70L, 78L, 86L, 104L, 110L, 115L, 126L, 134L, 144L, 154L, 166L, 172L, 185L, 190L, 196L, 197L, 213L, 227L, 239L, 248L, 258L, 263L, 271L, 281L, 296L, 309L, 319L, 323L, 324L, 342L, 352L, 362L, 369L, 377L, 384L, 400L, 410L, 418L, 424L, 432L, 444L, 456L, 462L, 470L, 477L, 486L, 500L, 509L, 517L, 526L, 539L, 546L, 554L, 563L, 574L, 585L, 592L, 593L, 618L, 628L, 632L, 634L, 642L, 653L, 670L, 680L, 681L, 691L, 709L, 713L, 728L, 732L, 739L, 745L, 759L, 769L, 773L, 780L, 798L, 799L, 816L, 819L, 829L, 855L, 857L, 865L, 874L, 887L, 894L, 904L, 912L, 921L, 929L, 938L, 949L, 961L, 970L, 976L, 982L, 989L, 999L, 1011L, 1024L, 1046L, 1047L, 1057L, 1061L, 1072L, 1081L, 1090L, 1099L, 1109L, 1125L, 1128L, 1136L, 1139L, 1160L, 1166L, 1176L, 1186L, 1194L, 1204L, 1212L, 1220L, 1227L, 1239L, 1249L, 1258L, 1259L, 1269L, 1278L, 1296L, 1303L, 1313L, 1323L, 1335L, 1345L, 1355L, 1366L, 1374L, 1375L, 1393L, 1403L, 1412L, 1420L, 1430L), class = "data.frame") 
$\endgroup$
3
  • $\begingroup$ Reducing the model to just 2/3 components: the random effect s(fSite, bs="re"), main effect s(CYR), and interaction s(CYR, by=fSeason) + fSeason, gam.vcomp yields stable results for either the s(CYR) + s(fSite, bs="re"), or s(CYR, by=fSeason) + fSeason + s(fSite, bs="re"), but not all 3. With all 3, s(CYR) has C.I. of 1.562761e-20 and 1.994083e+16. Adding m=1 to the interaction didn't help. $\endgroup$ Commented Jan 27 at 20:43
  • $\begingroup$ This is a complete guess, but I think the reason why s(CYR) and s(CYR, by=fSeason) + fSeason have relatively large C.I. is because the year effect is represented twice when it doesn't need to be (according to the data). So, there's nothing left for s(CYR) to estimate (C.I. of basically 0 effect also=0) - not sure why removing it results in crazy C.I.'s for interaction. The hugely wide C.I. for s(fSite, bs="re") are likely due to high concurvity between it and s(fSite,CYR) - not sure what to do about that. Removing s(fSeason,fCYR) help me see the results for concurvity - not sure why. $\endgroup$ Commented Jan 28 at 16:11
  • $\begingroup$ Correction: I said s(CYR, by=fSeason) + fSeason, but meant s(CYR, fSeason, bs = "sz") $\endgroup$ Commented Jan 29 at 13:54

2 Answers 2

4
+100
$\begingroup$

If we look at your model:

Formula: num ~ s(CYR) + s(CYR, fSeason, bs = "sz") + s(ToD) + s(ave_tt) + s(sed_depth) + s(DO) + s(sal) + s(fSite, bs = "re") + s(fSite, CYR, bs = "re") + s(fSeason, fCYR, bs = "re") + offset(log(area_sampled)) 

Some things stand out as being unneccessary or problematic.

s(fSeason, fCYR, bs = "re")

This is fitting a random intercept for each year within season. This is going to explain the same variation as these two terms s(CYR) + s(CYR, fSeason, bs = "sz"), except that the former says this variation can be stochastic and the latter that the variation is smooth. It is no wonder then, that the variance component has an excessively-wide credible interval - that term is not identifable. In effect, you are including the two sets of terms in the model that describe the same thing. The only way that I can see this kind of construction ever working, is if you put some limit on the complexity of these terms: s(CYR) + s(CYR, fSeason, bs = "sz"). But as in your data the trends are relatively simple anyway, I doubt there's much you can do.

Just include one of these two. Personally, I think the s(CYR) + s(CYR, fSeason, bs = "sz") terms are more useful.

s(fSite, bs = "re")

Given the other terms in the model, there is little between site variation. I suspect that given this, this term is also bordering on being unidentifiable given the other terms in the model for sediment depth, DO, etc. Also, as s(fSeason, fCYR, bs = "re") is very complex, it could well be accounting for a lot of between site variation.

s(fSite, CYR, bs = "re")

This doesn't make much sense to me; why should individual site's trends be linear but at the higher levels those trends are smooth, nonlinear?

You are including many terms that include CYR and I think your data simply don't provide information to identify all these effects separately.

This is being reflected in the very wide credible intervals on the variance components.

s(sal)

This ends up being linear and largely unimportant, and there are other linear terms in the model. So again, this looks like a lack of identifiability when you ahve all these other terms in the model. Simplifying the model by removing some of the terms above in my answer might help if the effect of sal is something that is important to include in the model.

General

Your data have a lot of zeroes or low counts. Have you checked to see if you are able to recreate these counts in your model diagnostics? An option here is to use a rootogram.

It also may well be that with such heterogeneous data, a mean-only model simply might not be capable of modelling that variation. The model has a single value for theta, the "overdispersion" parameter in the NB distribution. One might need to model theta as well, but this is not possible in mgcv. You might be able to handle this by moving to twlss() and put something simple like a site random effect into the p and scale linear predictors. Or perhaps you could fit your model using GJRM::gamlss(), which does have the ability to fit theta with a linear predictor as far as I can tell. gamlss::gamlss() can also do this.

In general, such large credible intervals are a good indicator that you are fitting a model that is too complex for the data. It would be a good idea in those circumstances to prune back some of the complexity. I have given specific ideas on how to do that above.

$\endgroup$
5
  • 1
    $\begingroup$ Thanks Gavin, really appreciate it! Hope all is well. $\endgroup$ Commented Jan 31 at 12:32
  • $\begingroup$ Side note: I don’t think each site has a linear trend in time, but thought any trend would fit better than none. I tried ‘s(fSite, CYR, bs = "fs")’ first, but the model didn’t converge. Might work now that I know I can get rid of other redundant terms! $\endgroup$ Commented Jan 31 at 12:52
  • $\begingroup$ @Nate It might work, but you also might not have enough data to do all those time models. Also, it might be worth just modelling the seasons directly rather than through a decomposition into average and sz smooths. It might end up being simpler that way. But do check out the model diagnostics including a rootogram and also check with DHARMa and it's simulated residuals checks if you ahven;t already $\endgroup$ Commented Jan 31 at 14:09
  • $\begingroup$ Will do! Btw, by "modelling the seasons directly" do you mean s(CYR, by=fSeason) + fSeason? Would I also include m=1 to lower the concurvity between it and a main year term, if I thought that made sense (and had enough data)? As in: s(CYR) + s(CYR, by=fSeason, m=1) + fSeason $\endgroup$ Commented Jan 31 at 14:16
  • 1
    $\begingroup$ Yes to y ~ s(CYR, by=fSeason) + fSeason + .... but no need for m = 1 here. I prefer the sz basis over s(CYR) + s(CYR, by=fSeason, m=1) + fSeason because the m=1 bit isn't really "smooth" $\endgroup$ Commented Jan 31 at 14:20
1
$\begingroup$

I got rid of the large intervals by drastically reducing the complexity of the model by: 1. forcing the annual trend and seasonal interaction to be linear CYR * fSeason, 2. removing salinity (s(sal)), and 3. only keeping the random intercept for site s(fSite, bs = "re") (despite the other random effects explaining large amounts of variation). All diagnostics on this reduced model were good.

This reduced model is counter-intuitive to me though, because I seem to have plenty of data to support a more complex version. I also know it's best to not force linearity or drop variables to get better results. I therefore welcome any other possible solutions..

> table(toad2$fSeason, toad2$fCYR) # Same 47 sites per season (some rows removed if covariate = NA). 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 DRY 21 47 34 46 47 46 39 47 47 43 47 47 47 0 47 47 47 WET 46 47 47 47 47 47 42 47 47 47 47 47 47 47 38 47 0 

Note: I fit what I thought to be the better model (above) before viewing this data below.

library(ggplot2) # Only using "loess" smoother to detect deviations from linear trend (not good for count data) ggplot(toad2, aes(CYR, num)) + geom_point(alpha = 0.2) + geom_smooth(method = "loess", se = FALSE, color = "black", show.legend = TRUE, aes(linetype = "Non-linear model")) + geom_smooth(method = "loess", se = FALSE, aes(color = fSeason)) + theme(axis.text.x = element_text(angle = 45, hjust = 1), axis.title.x = element_blank()) + ylab("Abundance") + labs(color = "By season trends", linetype = "Global annual trend") ggplot(toad2, aes(CYR, num)) + geom_point() + geom_smooth(method = "loess", se = FALSE, show.legend = TRUE, aes(linetype = "Non-linear model")) + theme(axis.text.x = element_text(angle = 45, hjust = 1), axis.title.x = element_blank()) + ylab("Abundance") + labs(linetype = "Trend per site") + facet_wrap(~fSite) 

Annual non-linear trend, by season by site trends

mod <- gam(num ~ # Abundance # Long-term annual trends CYR * fSeason + # Time of day s(ToD) + # Habitat covariates interactions s(ave_tt) + s(sed_depth) + # Water properties s(DO) + # Repeated sampling at each site s(fSite, bs = "re") + # Area surveyed offset(log(area_sampled)), data = toad2, method = 'REML', select = FALSE, family = nb(link = "log"), control = list(trace = TRUE), drop.unused.levels=FALSE) > gam.vcomp(mod) Standard deviations and 0.95 confidence intervals: std.dev lower upper s(ToD) 0.1720118821 4.741159e-02 0.62406867 s(ave_tt) 0.0115057712 3.850385e-03 0.03438169 s(sed_depth) 0.0007746879 2.635771e-07 2.27691010 s(DO) 0.1155487329 3.275116e-02 0.40766519 s(fSite) 0.5634195105 3.533165e-01 0.89846234 Rank: 5/5 > concurvity(mod, full = FALSE)$estimate para s(ToD) s(ave_tt) s(sed_depth) s(DO) s(fSite) para 1.000000e+00 1.062259e-22 2.110034e-26 6.669434e-27 8.559904e-23 0.021291928 s(ToD) 2.931125e-20 1.000000e+00 1.379512e-02 7.214123e-03 1.044829e-01 0.007769115 s(ave_tt) 6.307671e-24 1.058713e-02 1.000000e+00 2.230617e-02 1.022590e-02 0.020822778 s(sed_depth) 1.308948e-24 3.864585e-03 1.747942e-02 1.000000e+00 9.083229e-03 0.030104025 s(DO) 1.878226e-20 1.572410e-01 1.182054e-02 9.082554e-03 1.000000e+00 0.006605583 s(fSite) 1.000000e+00 4.645990e-02 2.733317e-01 3.912287e-01 3.269751e-02 1.000000000 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.