Skip to main content
6 of 7
added context
pglpm
  • 1.3k
  • 1
  • 9
  • 20

Distribution of *conditional* frequencies when frequencies follow a Dirichlet distribution

Context: we have a large number of individuals characterized by two binary traits; call these $T$ with values $\{0,1\}$, and $T'$ with values $\{0',1'\}$. So there are four types of individuals: $00'$, $01'$, $10'$, $11'$, which appear in the population with unknown relative frequencies $f_{00'}$, $f_{01'}$, $f_{10'}$, $f_{11'}$ summing up to one.

Suppose that our degree of belief about these frequencies (assumed continuous) is expressed by a Dirichlet distribution with parameters $(Aa_{00'}, Aa_{01'}, Aa_{10'}, Aa_{11'})$, the $a$s summing up to one: $$\mathrm{p}[f_{00'}, f_{01'}, f_{10'}, f_{11'} \mid A, (a_{00'}, a_{01'}, a_{10'}, a_{11'})] \propto \prod_{i=0}^1\prod_{j'=0'}^{1'} f_{ij'}^{A a_{ij'}-1}\;\delta\bigl({\textstyle\sum_{ij'}}f_{ij'}-1\bigr).$$

We can also consider the marginal frequencies of individuals having trait $T'$ only, for example: $f_{0'} \equiv f_{00'} + f_{10'}$ and $f_{1'} \equiv f_{01'} + f_{11'}$. Owing to the "aggregation" property of the Dirichlet distribution (Kotz & al 2000, also Basu & al 1982), these marginal frequencies also have a Dirichlet distribution with parameters $\bigl(A(a_{00'} + a_{10'}), A(a_{01'} + a_{11'})\bigr)$ (a Beta distribution).

Question: Consider now the conditional frequencies of trait $T$ given $T'$, for example $$f_{1\mid 0'} \equiv \frac{f_{10'}}{f_{00'}+f_{10'}}.$$ What distribution expresses our degree of belief about such a conditional frequency, given the context above?

While I sit down and calculate (or sample), I'd be grateful for any literature or calculation hints on this. Thank you!

Additional motivation: For inference about sequential data, like for example text, speech, genes, some literature express the degree of belief about conditional frequencies $f_{i\mid j}$ (of, say, one word given the previous one) with a Dirichlet distribution (e.g. MacKay & al 1995): $$\mathrm{p}[f_{i \mid j} \mid A, (a_{i\mid j})] \propto \prod_{i} f_{i\mid j}^{A a_{i\mid j}-1}\;\delta\bigl({\textstyle\sum_{i}}f_{i\mid j}-1\bigr), \qquad\text{for every }j.$$ This approach is different from using a Dirichlet distribution for the joint frequencies $f_{ij}$, and I wonder how different is the distribution for the conditional frequencies that we obtain by assuming Dirichlet for the joint frequencies instead, as in my question above.

References:

– Basu, de Bragança Pereira: On the Bayesian analysis of categorical data: the problem of nonresponse (1982) https://doi.org/10.1016/0378-3758(82)90004-0, §§ 3–4.

– Kotz, Balakrishnan, Johnson: Continuous Multivariate Distributions. Vol. 1 (2nd ed. Wiley 2000), §49.1.

– MacKay, Peto: A hierarchical Dirichlet language model (1995) https://doi.org/10.1017/S1351324900000218, https://pdfs.semanticscholar.org/01fa/57bd91f731522c861404d29e4604ba6ac6d3.pdf.

pglpm
  • 1.3k
  • 1
  • 9
  • 20