Skip to main content
8 events
when toggle format what by license comment
May 2, 2023 at 19:07 comment added z8080 One more sanity check: If I create a distribution of random numbers (y_obs_null = randn(1,length(x)) in the code above), the significance of the K-S test is the same as for the actual Zipf distribution (p-value very close to 0), so they are both rejected as Zipfian for the same reason so to speak, even though one is clearly an excellent fit and the other one not. This tells me either K-S is not an appropriate goodness of fit test for this distribution, or that higher p values do indicate a better fit - but that a statistical threshold (alpha) much higher than .05 should be used for them.
Apr 30, 2023 at 13:43 comment added z8080 Sorry but I'm still not getting it. Running your first code, I get p-value = 0.0195 every time. This enables me to reject the null hypothesis, meaning that my test distribution - constructed to be as zipfian as possible except for the added noise - is qualitatively not the same as the reference (Zipf) distribution. That is, Zipf isn't a good fit. Am I missing something?
Apr 28, 2023 at 11:03 history edited Coen Hacking CC BY-SA 4.0
Added anova possibility
Apr 28, 2023 at 10:38 comment added Coen Hacking This is also how MATLAB does it, and the first output variable H is actually the 0 for accepting the null hypothesis and 1 for rejecting. But running this code, it accepted the null hypothesis, as my p is 0.052 > a = 0.05
Apr 28, 2023 at 10:36 comment added Coen Hacking This is from the 'scipy' documentation for a two-sided test: "The null hypothesis is that the two distributions are identical, F(x)=G(x) for all x; the alternative is that they are not identical. The statistic is the maximum absolute difference between the empirical distribution functions of the samples. [...] Suppose we wish to test the null hypothesis that two samples were drawn from the same distribution. We choose a confidence level of 95%; that is, we will reject the null hypothesis in favor of the alternative if the p-value is less than 0.05." I.e. the test is the right way around.
Apr 28, 2023 at 8:36 comment added z8080 Thanks a lot! The plot makes sense, and suggests the Zipf distribution fits very well to the example data set. However, the Kolmogorov-Smirnov statistic is significant at p = 0.02, with the unexpected conclusion that therefore "The data is not from a Zipf distribution.". I thought maybe it's the other way around, i.e. maybe significance in this test actually indicates a good rather than bad fit; but it's quite clear that the null hypothesis (which here `re rejecting) is that the sample is drawn from the reference distribution.
S Apr 27, 2023 at 23:00 review First answers
Apr 28, 2023 at 0:00
S Apr 27, 2023 at 23:00 history answered Coen Hacking CC BY-SA 4.0