Timeline for Testing goodness of fit for a Zipf distribution (in Matlab)

Current License: CC BY-SA 4.0

8 events

when toggle format	what		by	license	comment
May 2, 2023 at 19:07	comment	added	z8080		One more sanity check: If I create a distribution of random numbers (`y_obs_null = randn(1,length(x))` in the code above), the significance of the K-S test is the same as for the actual Zipf distribution (p-value very close to 0), so they are both rejected as Zipfian for the same reason so to speak, even though one is clearly an excellent fit and the other one not. This tells me either K-S is not an appropriate goodness of fit test for this distribution, or that higher p values do indicate a better fit - but that a statistical threshold (alpha) much higher than .05 should be used for them.
Apr 30, 2023 at 13:43	comment	added	z8080		Sorry but I'm still not getting it. Running your first code, I get `p-value = 0.0195` every time. This enables me to reject the null hypothesis, meaning that my test distribution - constructed to be as zipfian as possible except for the added noise - is qualitatively not the same as the reference (Zipf) distribution. That is, Zipf isn't a good fit. Am I missing something?
Apr 28, 2023 at 11:03	history	edited	Coen Hacking	CC BY-SA 4.0	Added anova possibility
Apr 28, 2023 at 10:38	comment	added	Coen Hacking		This is also how MATLAB does it, and the first output variable H is actually the 0 for accepting the null hypothesis and 1 for rejecting. But running this code, it accepted the null hypothesis, as my p is 0.052 > a = 0.05
Apr 28, 2023 at 10:36	comment	added	Coen Hacking		This is from the 'scipy' documentation for a two-sided test: "The null hypothesis is that the two distributions are identical, F(x)=G(x) for all x; the alternative is that they are not identical. The statistic is the maximum absolute difference between the empirical distribution functions of the samples. [...] Suppose we wish to test the null hypothesis that two samples were drawn from the same distribution. We choose a confidence level of 95%; that is, we will reject the null hypothesis in favor of the alternative if the p-value is less than 0.05." I.e. the test is the right way around.
Apr 28, 2023 at 8:36	comment	added	z8080		Thanks a lot! The plot makes sense, and suggests the Zipf distribution fits very well to the example data set. However, the Kolmogorov-Smirnov statistic is significant at p = 0.02, with the unexpected conclusion that therefore "The data is not from a Zipf distribution.". I thought maybe it's the other way around, i.e. maybe significance in this test actually indicates a good rather than bad fit; but it's quite clear that the null hypothesis (which here `re rejecting) is that the sample is drawn from the reference distribution.
S Apr 27, 2023 at 23:00	review	First answers
Apr 28, 2023 at 0:00
S Apr 27, 2023 at 23:00	history	answered	Coen Hacking	CC BY-SA 4.0