Probability of 2 discrete samples coming from same distribution

Question

I would like to know how to calculate the probability that 2 discrete samples come from the same distribution, and if so, which one is the distribution they are coming from.

Let's say we have 3 buckets, and 2 years.

Bucket\year	2019	2020
A	76%	73%
B	20%	22%
C	4%	5%

I would like to claim that they both come from the same distribution (which I'd assume is close to [74.5%,21%,4.5%]) and that the variation between years is just given by random chance (with which probability?). I think in order to make this claim I have to calculate the probability that both samples come from the same distribution (I heard about 'power' for continous random variables, but I don't know if there's an equivalent in discrete verison). Any hint on how to proceed?

Thanks a lot!

As side note: both periods of time have different amount of datapoints, ie. 2019 has 350, 2020 has 400. Is it too much of a problem?

Knowing the number of data points is essential (it makes a big difference to a significance test between them being hundreds or being millions) but them changing between years is not a problem. A chi-square test on the counts (preferably unrounded) rather than proportions may meet your needs — Henry
– Henry, Commented Sep 19, 2023 at 10:12
Thanks Henry! So the Chi-square needs a Null-hypothesis. What would it be in this scenario? I could assume that the 'real' distribution is the one of 2019, 2020, or the global (generated using data from both years) — Oscar Flores
– Oscar Flores, Commented Sep 19, 2023 at 10:16
The third option is the appropriate one. Under the null hypothesis (no difference between 2019 and 2020) you can combine the data from the two years. — Doctor Milt
– Doctor Milt, Commented Sep 19, 2023 at 10:25
Unless you assume a prior probability, the answer is definitely no, no matter what. The kind of question you could answer with something like a chi-squared test concerns how consistent the data are with a hypothetical common distribution. Different numbers of data points are no problem, but the degree to which different data might be independent is a key consideration, especially when data are collected over time. — whuber
– whuber ♦, Commented Sep 19, 2023 at 19:32

Stack Exchange Network

Probability of 2 discrete samples coming from same distribution

0

Hot Network Questions

Probability of 2 discrete samples coming from same distribution

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions