6
$\begingroup$

I am trying to use Kolmogorov-Smirnov test to check the goodness of fit of the distributions for the dataset. I have dataset consisting of 100,000 samples and I apply expectation-maximization algorithm to get the mixture of gamma distributions from which I get the weights, shape and scale values. This graph shows the distribution histogram and the pdfs of the mixture distribution. The gamma components of the mixture are shown in dashed lines 1 Now, the empirical cdf of the dataset above and theoreteical gamma cdf for the mixture distribution graphs overlap as in this plot 2 But the pvalue of the Kolmogorov-Smirnov test is 0. I don't understand the reason.

$\endgroup$
4
  • $\begingroup$ The P-value may indeed be reported as zero. What this means is that the value is extremely, extremely small -- but not zero; not that this distinction makes any practical difference. $\endgroup$ Commented Jul 3, 2023 at 17:02
  • $\begingroup$ As you do not know the process generating the data, any model must be wrong and for such a large data set the null hypothesis that the data exactly was generated by the model will be rejected. An internet search for "null hypothesis controversy" will show explanations of the problem, e.g. advstats.psychstat.org/book/hypothesis/pvalue.php. Your CDF comparison shows that the fit actually is quite good, which you presumably can further confirm by a QQ plot close to a straight line. If you get such a decent fit with only a few parameters, I would not worry too much about GOF tests. $\endgroup$ Commented Jul 3, 2023 at 18:55
  • 1
    $\begingroup$ 1. When the sample size is huge, the standard error of the cdf estimate is tiny. You should not expect a test of exact equality not to spot the tiny difference that's there. (Indeed I suspect such a test is not really what you want.) $\,$ 2. Since you're estimating parameters, the usual KS test (which is for fully specified distributions, not estimated ones) is not appropriate as is. If you're also choosing the number or form of the components based on the data you'd need to account for that as well. $\endgroup$ Commented Jul 4, 2023 at 8:44
  • $\begingroup$ If your data truly are 100K independent samples from a constant distribution, and you are capable of taking accurate measurements with zero errors, then the p-value indicates that these data are not distributed according to the fitted gamma distribution; Some more complex (probably compound) process is at work. However, since real-world data rarely adhere to these strict conditions, the KS test result has no practical implications. $\endgroup$ Commented Jul 4, 2023 at 20:37

2 Answers 2

10
$\begingroup$

The claim of the KS test being “over-powered” in another answer takes it too far. The KS test, like many hypothesis tests, does exactly what it says it will do: it evaluates if the data are inconsistent with the null hypothesis being exactly true. With a huge sample size, the test is able to detect small deviations from the null hypothesis being exactly true. This is how hypothesis testing is designed to work. After all, the null hypothesis being ever so slightly incorrect is one way that the null hypothesis can be incorrect, and the test is doing its job by flagging the null as false.

Thus, if you see this as a flaw of hypothesis testing, that hypothesis testing can reject small devotions when the sample size is huge, then hypothesis testing might not be right for your goals (which is fine, and hypothesis testing is probably overused, anyway).

I have likened this to the Princess and the Pea fairy tale. The princess is not wrong to identify a pea under the mattress, and the other women are not wrong to have the stance of, “Yeah, whatever, I’ll sleep fine,” but they would be wrong to say there is no pea under the mattress.

$\endgroup$
4
$\begingroup$

In brief, the KS goodness of fit test (as for many GoF tests) is strongly-powered for larger and larger sample sizes...a concept ofter referred to being "over-powered" in domains were null hypothesis tests are—correctly or not—used to make decisions about equality (such as the use of chi-square tests to assess data-model fit in latent variable models).

That is to say, as the sample sizes get larger, the test is going to flag even the smallest deviation from the null hypothesis as statistically significant (whether this difference is pragmatically different or not). And, with a sample size of 100k, this is most definitely why you have such a small p-value.

In this context, if your goal is to show that the distributions ARE the same, then the use of the null hypothesis statistical testing (NHST) framework may not be your best statistical tool of choice. You may want to use other means to evaluate the nature of the distribution (and comparing the empirical cdf to the hypothesized cdf is a reasonable choice).

$\endgroup$
6
  • 4
    $\begingroup$ It's difficult to conceive of what "over-powered" might possibly mean and how that distinguishes this test from any null hypothesis test. $\endgroup$ Commented Jul 3, 2023 at 14:11
  • $\begingroup$ Is that to say KS test is not reliable for datasets of larger size? Rather than, graphical interpretation is there any metric that can be used for evaluation? $\endgroup$ Commented Jul 3, 2023 at 14:24
  • 2
    $\begingroup$ @amitha It is reliable for detecting distributions that are not in agreement with the test distribution...but to use the test to confirm that your distribution IS in agreement (which I believe you wish to do), you must have near perfect agreement to achieve this. (When I get a chance, I will prepare a simulation to demonstrate this and add as an edit to my answer.) $\endgroup$ Commented Jul 3, 2023 at 14:38
  • 2
    $\begingroup$ @amitha Reliable in what sense? I say the KS test is extremely reliable when the sample size gets large, as its ability to detect deviations from the null hypothesis becomes spectacular, yet it does not reject particularly often when the null is true. $\endgroup$ Commented Jul 3, 2023 at 16:31
  • 1
    $\begingroup$ @amitha The $\alpha$-level means what it is supposed to mean (not a given that every test has this property, but Ks behaves pretty well). Perhaps read what I posted here. $\endgroup$ Commented Jul 4, 2023 at 14:27

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.