I have a dataset with hashtags and their frequencies (~370k frequencies), as for example (after a sort):
373827 hashtag_1 373826 hashtag_2 373826 hashtag_3 373826 hashtag_4 373825 hashtag_5 373823 hashtag_6 373823 hashtag_7 373822 hashtag_8 and I want to check if this frequencies follows Zipf-Mandelbrot's law (link) through fitting in R. So I must estimate the two parameters of the law (the exponent s and q following Wikipedia page). For this purpose I have decided to use function zm.ll of tolerance package that uses ML estimation, obtaining s~1,65 and q~0.085.
Now I'd understand if this estimation is correct through Chi-square test but I didn't understand how to do this test in R and the interpretation of the p-value returned. I've tried with this that return p.val=0:
# ZipfMan law where y is the frequencies x <- (((1:length(y))+0.085)^1.65) p <- x/sum(x) tab <- tabulate(y, length(y)) discrepancy <- (tab-length(y)*p)^2/(length(y)*p) chi.stat <- sum(discrepancy) p.val <- pchisq(chi.stat, df= length(y)-2-1, lower.tail = FALSE) c(p.val, chi.stat) I have some doubts about this piece of code (also about the choice of degree of freedom) and the result returned by it and I have difficulties to interpret p-value.