43
$\begingroup$

I'm comparing a sample and checking whether it distributes as some, discrete, distribution. However, I'm not enterily sure that Kolmogorov-Smirnov applies. Wikipedia seems to imply it does not. If it does not, how can I test the sample's distribution?

$\endgroup$
3
  • 1
    $\begingroup$ +1 A beautiful example of mistakenly applying the K-S Test to data with (many) ties is given on the help page for an Excel statistics add-on at real-statistics.com/non-parametric-tests/goodness-of-fit-tests/…. The result is wrong for many reasons. Caveat lector! $\endgroup$ Commented Aug 28, 2018 at 14:23
  • $\begingroup$ KS-tests for discrete null distributions are available: en.wikipedia.org/wiki/… $\endgroup$ Commented Dec 31, 2018 at 0:18
  • $\begingroup$ A more thorough answer can be found in a closely related question: stats.stackexchange.com/questions/88764/… $\endgroup$ Commented Sep 8, 2020 at 18:43

3 Answers 3

19
$\begingroup$

It does not apply to discrete distributions. See http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm for example.

Is there any reason you can't use a chi-square goodness of fit test? see http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm for more info.

$\endgroup$
6
  • 2
    $\begingroup$ Sorry for the intrusion, but i don't really understand why it is applicable only to continuous distribution (K-S and other validation tests). Can someone explain to me this fact? $\endgroup$ Commented Sep 15, 2011 at 12:17
  • 8
    $\begingroup$ @Maurizio -- the K-S test statistic has the same distribution under all continuous distributions, but if the actual distribution is not continuous, and one tries to construct a level $\alpha$ test assuming that the distribution is continuous, then the actual level of the test with be less than $\alpha$. (c.f. Lehmann & Romano Testing Statistical Hypotheses, Third Edition, p. 584). You can still make a level $\alpha$ test based on the K-S statistic, but you'll have to find some other method to get the critical value, e.g. by simulation. $\endgroup$ Commented Oct 11, 2011 at 4:58
  • 3
    $\begingroup$ There is a discrete KS-test: stat.yale.edu/~jay/EmersonMaterials/DiscreteGOF.pdf $\endgroup$ Commented Dec 29, 2018 at 14:07
  • $\begingroup$ @DavidR It is pretty straightforward to construct a K-S test which allows for tied values (i.e. for discreet data). See for example, Schröer, G., & Trenkler, D. (1995). Exact and randomization distributions of Kolmogorov-Smirnov tests two or three samples. Computational Statistics & Data Analysis, 20, 185–202. $\endgroup$ Commented Apr 11, 2023 at 16:25
  • $\begingroup$ So, If I understand correctly, the test is actually valid in the sense that a false positive result occurs with at most the prescribed probability, but the test is not as powerful as it could be. Is this correct? This seems to me to be a very different claim than that "it does not apply"! $\endgroup$ Commented Jun 27, 2023 at 19:12
13
$\begingroup$

As is often the case in statistics, it depends on what you mean.

  1. If you mean "I calculate my test statistic on a sample drawn from a discrete distribution and then look up the standard tables" then you'll get a true type I error rate lower than the one you chose (possibly a lot lower).

    How much depends on "how discrete" the distribution is. If the probability of any one outcome is fairly low (so the proportion of tied-values in the data would be expected to be low) then it won't matter very much -- many people wouldn't have a problem with running a 5% test at 4.5% say. So for example, if you're testing a discrete uniform on [1,1000], you probably needn't worry.

    But if there's a high probability of a value being tied, then the effect on the type I error rate can be marked. If you get a significance level of 0.005 when you wanted 0.05, that may be an issue, since it will correspondingly impact the power.

  2. If instead you mean "I calculate my test statistic on a sample drawn from a discrete distribution and then use a suitable critical value/calculate a suitable p-value for my situation" (say via a permutation test, for example), then the test is certainly valid in the sense that you'll get the right type I error rate -- up to the discreteness of the test statistic itself, of course. (Though there may well be better tests for your particular purpose, just as there usually are in the continuous case.)

    Note that the distribution of the test-statistic itself is no longer distribution-free but a permutation-test avoids that issue.

So sometimes it's okay to use the standard tables even with discrete distributions, and even when its not okay, it's not so much the test statistic as the critical values/p-values you use with it that's the issue.

$\endgroup$
0
4
$\begingroup$

I believe the K-S test uses the fact that if $X$ is a random variable with CDF $F$ then $F(X)$ is a uniform random variable. This is not the case if $X$ is not continuous. For example, if $X$ is Bernoulli then $F(X)=X$, not a uniform.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.