Questions tagged [zipf]
The zipf tag has no summary.
38 questions
0 votes
0 answers
82 views
Which test is better for evaluating goodness-of-fit for a dataset of millions of discrete values following Zipf distribution? KS test or Chi-square?
I have a dataset of 40 million discrete values, whose histogram follows a Zipf distribution with the following statistical parameters: Minimum = 1, Maximum = 1738, Mean = 2.16, STD = 16.50, P95 = 4, ...
1 vote
1 answer
75 views
What is the distribution of city sizes for cities whose rank size obeys Zipf's law?
Been wracking my brain over this one but can't quite get it. I want a distribution for the population size of randomly-chosen cities, where I can assume the city's rank obeys Zipf's law with a known ...
1 vote
0 answers
25 views
Given the rank and frequency find the constant in Zipfs law [duplicate]
From a total of $N$ words i have the following dataset where the first column represents the ranks and the second the frequency. For example $$\begin{array}{cc} 1 & 4300 \\ 2 & 3100 \\ 3 & ...
0 votes
1 answer
288 views
Estimating exponent of Zipf distribution using MLE vs fitting linear regression on log-transformed rank and frequency data
I'm having trouble understanding why I get radically different results if I try to find the parameter of a Zipf distribution when I use the methods proposed by Clauset et al. (2009) as opposed to ...
1 vote
1 answer
284 views
Testing goodness of fit for a Zipf distribution (in Matlab)
I have several ranking distributions and would, for each one, like to fit a [Zipf distribution][1], and estimate the goodness of fit relative to some standard benchmark. With the Matlab code below, I ...
6 votes
1 answer
1k views
If my data doesn't completely follow the Zipf's law, how do I justify it mathematically?
Zipf's law states that in a text set $s=1$ a few words occur very often, and many words hardly ever occur. Zipf’s law for text sets $s = 1$ in the Zipf distribution defined by: $$f(k; s, N) = \frac{k^{...
0 votes
1 answer
274 views
How do I calculate a proper sample size to get a suitable power of Kolmogorov-Smirnov test with an underlying Zipf distribution?
I have been trying to develop to calculate the sample size to maximize the power of KS test (±0.8) on an underlying Zipf distribution. I have tried estimating the power by performing simulations: <...
2 votes
0 answers
118 views
What is a dragon king?
I am studying a system of cities where the largest city appears to be in many aspects an outlier. The distribution of city size - in any country - are often claimed to follow Zipf's law. According to ...
1 vote
0 answers
46 views
Zipf's Law: Estimating # words over N [closed]
Say I have a large corpus of p words, and constant (f.r = C) equals p.(1/10). How do you go ...
0 votes
1 answer
268 views
Interpreting KL Divergence
I'm trying to compare different approaches to rank predictions. I have the ground truth distribution $P$ (discrete, zeta distribution) and two or more distributions ($Q, Q', Q'', Q'''$ in this case) I'...
3 votes
1 answer
2k views
Discrete Pareto Distribution vs Zipf Distribution and Power Law vs Zipf Law
I need to get a simple, but clear idea of Discrete Pareto Distribution vs Zipf Distribution and Power Law vs Zipf Law. (Are they similar/ how they relate to each other.) Wikipedia definitions do not ...
0 votes
1 answer
452 views
need explanation about the exponent parameter s in zipf distribution
I need to model the popularity of some requested files from a library with Zipf distribution and I want to simulate it in MATLAB. I don't know what's the effect of parameter s on my result. for ...
1 vote
0 answers
734 views
Practically speaking, is the TF-IDF threshold universal across different corpus?
I would like to know the practical threshold of the TF-IDF (just like the practical p-value cutoff of 0.1 or 0.05 in hypothesis tests). I tried to look at it in some previous post, and some people ...
10 votes
2 answers
2k views
Is KS test really appropriate when validating a power law/estimating power law parameters?
I'm attempting to find out whether some highly skewed data are drawn from a power law distribution, following the popular paper by Clauset, Shalizi and Newman, 2009. Clauset et al. use the Kolmogorov-...
2 votes
1 answer
1k views
Calculate Zipf-Mandelbrot parameters from distribution
I am fetching trending topics from social media where the frequency of likes is said to follow a Zipf-Mandelbrot distribution; i.e., some of the posts will have a high number of likes and some other ...