I am trying to infer the distribution of plausible group counts ($K$) in a population stratified by $E$ exposures and 1 outcome (all binary); so, we have $N_e = 2^E$ total exposure groups and $N_k = 2 N_e$ total groups (adding the outcome stratification).
The available data is:
- total population size ($N$)
- prevalence of the outcome ($P_y$)
- prevalence of each exposures ($P_e$)
- odds ratios for each exposure with the outcome ($OR_e$)
I set up a model in Stan to sample group counts $K$ from a multinomial distribution with wide prior on the parameter vector ($\theta$), and then compute the prevalence & odds ratios from $K$ hoping to propagate this information up to $\theta$ during sampling.
However, it seems that sampling statements are only permitted for directly observed data. At least, K ~ multinomail(p) fails to determine the total number of trials at runtime, since K is not directly observed (and the number of trials cannot be provided as an argument). Alternatively, K = multinomial_rng(p,N) is illegal in the model block like all _rng functions.
Perhaps I need a different approach altogether?
Stan code
data { int N; // num individuals int Ns; // num strata (exposures + 1) int Nk; // num groups = 2^Ns vector[Ns ] Pe; // prevalence of outcome, exposures vector[Ns-1] OR; // odds ratios: outcome & exposures array[Ns, Nk/2] int iexp; // indices: e (Pe numerator) array[Ns-1,Nk/2] int inum; // indices: e == y (OR numerator) array[Ns-1,Nk/2] int iden; // indices: e != y (OR dominator) } parameters { simplex[Nk] p; // group probability (theta) } model { array[Nk] int K; // latent counts vector[Ns ] uPe; // latent prevalence vector[Ns-1] uOR; // latent OR // sample counts K ~ multinomial(p); // (!) FAILS AT RUNTIME K = multinomial_rng(p,N); // (!) FAILS AT COMPILETIME for (s in 1:Ns){ uPe[s] = sum(K[iexp[s]]) / Ni; Pe[s] ~ normal(uPe[s],.001); // dummy sampling } for (s in 1:Ns-1){ uOR[s] = sum(K[inum[s]]) / sum(K[iden[s]]); OR[s] ~ normal(uOR[s],.001); // dummy sampling } }