I wish you good health and the best in life (whatever that means for you in particular).
Context
I want to model aggregated (monthly) tornado counts in the United States for a question on Metaculus. Specifically, I want to predict the number of tornadoes that will occur in the US in June 2024.
The National Oceanic and Atmospheric Administration's (NOAA) Storm Prediction Center provides historical tornado count data by state, month, and year; I wish to constrain myself to only using this data.
Advice Desired
For this problem, I took a multi-level modelling approach. I am not familiar with multi-level modelling and am fairly immature, mathematically speaking. I would very much appreciate advice on the following questions:
- How and where my attempted problem-formulation description is misaligned with the mathematics I espouse (i.e. my discrepancies / how I wrong; criticism is appreciated!)
- Whether my model is best interpreted as something other than a multi-level model.
- What other multi-level modelling approaches might be better suited to the nature of this specific problem.
- General advice, including on the structure of this question or the associated repository
Attempt
The following can be found in this document (which is in the aforementioned repository):
Tornado counts vary across state, month, and year. One might imagine that climatic factors change over time (across years), that the formation of tornadoes alters within a year, over different months, and that the rate of tornado formation is also affected by local geographical features (state location).
To adequately incorporate information across these categories, we can create a multilevel model. For each relevant category (State, Month, Year) available to us, we want to estimate its intercept and internal variation (e.g., across each State, how does tornado count vary?).
Since we are estimating tornado counts, a Poisson distribution seems appropriate.
Proceeding, let $\alpha_i$ denote the random effect of the location of state $i$ on expected tornado count. Further, let $\gamma_j$ and $\delta_k$ denote the random of effects of month $j$ on and year $k$, respectively, on expected tornado count. Let the expected tornado count for state $i$, month $j$, and year $k$ be called $Y_{ijk}$.
The tornado count $T_{ijk}$ for state $i$, month $j$, and year $k$ can be modelled via $T_{ijk} \sim \text{Poisson}(Y_{ijk})$, where $\log (Y_{ijk}) = \alpha_{\text{state}[i]} + \gamma_{\text{month}[j]} + \delta_{\text{year}[k]}$, i.e. $Y_{ijk}$ is log-linear.
Note that the author is "uncertain" regarding what constitute "appropriate" priors distributions and parameter values to use for a model on tornadoes.
Adaptive priors (NOTE: 10 years of data are being used):
$$ \begin{aligned} \alpha_{i} &\sim \mathcal{N}(\overline{\alpha}, \sigma_{\alpha}) \quad \text{for} \quad i = 1, 2, \dotsc, 50 \\\\ \gamma_{j} &\sim \mathcal{N}(\overline{\gamma}, \sigma_{\gamma}) \quad \text{for} \quad j = 1, 2, \dotsc, 12 \\\\ \delta_{k} &\sim \mathcal{N}(\overline{\delta}, \sigma_{\delta}) \quad \text{for} \quad k = 1, 2, \dotsc, 10 \end{aligned} $$
where
$$ \begin{aligned} \overline{\alpha} &\sim \mathcal{N}(0, \sigma_{\overline{\alpha}}) \\\\ \overline{\gamma} &\sim \mathcal{N}(0, \sigma_{\overline{\gamma}}) \\\\ \overline{\delta} &\sim \mathcal{N}(0, \sigma_{\overline{\delta}}) \end{aligned} $$
and
$$ \begin{aligned} \sigma_{\alpha}, \sigma_{\gamma}, \sigma_{\delta} &\sim \text{Exponential}(1.0) \\\\ \sigma_{\overline{\alpha}}, \sigma_{\overline{\gamma}}, \sigma_{\overline{\delta}} &\sim \mathcal{U}(0.0, 6.0) \end{aligned} $$
Code
NOTE: this is only the part of the code most relevant to the above description.
import jax.numpy as jnp import numpy as np import numpyro as npro import numpyro.distributions as dist def model_01(states=None, months=None, years=None, tornadoes=None): # states, months, years, and tornadoes represented num_states = len(np.unique(states)) num_months = len(np.unique(months)) num_years = len(np.unique(years)) # state effect hyperparameters alpha_mu_sigma = npro.sample("alpha_mu_sigma", dist.Uniform(0.0, 6.0)) alpha_mu = npro.sample("alpha_mu", dist.Normal(0, alpha_mu_sigma)) alpha_sigma = npro.sample("alpha_sigma", dist.Exponential(1.0)) # state effect with npro.plate("states", num_states): alpha = npro.sample("alpha", dist.Normal(alpha_mu, alpha_sigma)) # month effect hyperparameters gamma_mu_sigma = npro.sample("gamma_mu_sigma", dist.Uniform(0.0, 6.0)) gamma_mu = npro.sample("gamma_mu", dist.Normal(0, gamma_mu_sigma)) gamma_sigma = npro.sample("gamma_sigma", dist.Exponential(1.0)) # month effect with npro.plate("months", num_months): gamma = npro.sample("gamma", dist.Normal(gamma_mu, gamma_sigma)) # year effect hyperparameters delta_mu_sigma = npro.sample("delta_mu_sigma", dist.Uniform(0.0, 6.0)) delta_mu = npro.sample("delta_mu", dist.Normal(0, delta_mu_sigma)) delta_sigma = npro.sample("delta_sigma", dist.Exponential(1.0)) # year effect with npro.plate("years", num_years): delta = npro.sample("delta", dist.Normal(delta_mu, delta_sigma)) # expected tornadoes Y = jnp.exp(alpha[states] + gamma[months] + delta[years]) # likelihood npro.sample("obs", dist.Poisson(Y), obs=tornadoes) Resources
Multilevel model (Wikipedia)
Thank you very much and I hope you have a nice day.
—FilteredFrames


