How to fit one distribution to another?

Question

I have a custom distribution created to model some experimental observations. While too complicated to include in this question, I can provide an example and some illustrations to convey a sense of it.

Take the following as representing the PDF of the distribution:

pdf = (0.334336 (E^(2.56822 (-4.1816 - Log[x])) Erfc[ 3.54409*10^7 (-4.1816 - Log[x])] + E^(0.904055 (4.1816 + Log[x])) Erfc[ 3.54409*10^7 (4.1816 + Log[x])]))/x;

Which one can readily plot:

Plot[pdf, {x, 0, 0.05}, PlotRange -> {All, {0, 100}}]

PDF[dist]

Defining the distribution, dist as :

dist = ProbabilityDistribution[pdf, {x, 0, ∞}]

I can generate RandomVariates of dist and plot them in a Histogram:

Histogram[RandomVariate[dist, 10^5], {0, 0.05, 0.0005}, "PDF", ImageSize -> 300, PlotRange -> {All, {0, 100}}]

histogram

You can see that the histogram and plot of the PDF look pretty similar. So far so good.

I have conjectured that if one takes sufficient observations the data will eventually converge to an exponential distribution. So, I thought to estimate an exponential distribution from (in this case) generating a bunch of random data using the original distrubtion.

Plot[{ PDF[EstimatedDistribution[RandomVariate[dist, 10^4], ExponentialDistribution[λ]], x], pdf}, {x, 0, 0.05}, ImageSize -> 300, PlotRange -> {All, {0, 100}}]

estimated dist converging

I like this, BUT this leads to my first question:

Where do all those squiggles in the PDF of the exponential distribution come from?

Take a closer look:

squiggle zoom

Does this seem normal? Shouldn't I get a smooth curve? Hopefully someone will have an insight, but this got me wondering and led me to a second and maybe more interesting question:

Does Mathematica have a way to directly fit one distribution to another without (in the above case) the intermediate step of generating a set of random variates first?

Since dist is defined easily, it would've been better to do that instead of noting everywhere that the code won't run... it looks much better now :) — rm -rf
– rm -rf ♦, Commented Sep 13, 2012 at 21:01
@R.M -- Of course! I should have thought of that. Just late in the day. Thx. — Jagra
– Jagra, Commented Sep 13, 2012 at 21:23
Note that this approach does not solve your stated problem. If you conjecture that the true underlying distribution is exponential, then you ought to consider something like a Maximum Likelihood estimate of the exponential parameter (based on the data you do have) followed by goodness-of-fit testing. — whuber
– whuber, Commented Sep 13, 2012 at 22:21
Doesn't one typically do this by minimizing the KL-Divergence? I guess those intergrals would be too hard to do and the random variates approach is best. — Histograms
– Histograms, Commented May 25, 2015 at 15:57

rm -rf · Accepted Answer · 2012-09-13 21:07:44Z

The problem is because you're evaluating an EstimatedDistribution for each plot point! A different set of random data are generated for each point, based on which the fit is recalculated. This is why you get the wiggles, because each time, you get slightly different values for the parameter $\lambda$.

Try the following instead:

With[{fit = EstimatedDistribution[RandomVariate[dist, 10^4], ExponentialDistribution[λ]]}, Plot[{PDF[fit, x], PDF[dist, x]}, {x, 0, 0.05}] ]

fit

Yeah, I was about to post that an Evaluateshould be thrown in. — Sjoerd C. de Vries
– Sjoerd C. de Vries, Commented Sep 13, 2012 at 21:09

Sjoerd C. de Vries · Accepted Answer · 2012-09-13 21:19:12Z

2

As to your main question, fitting distribution with distributions: AFAIK there is no built-in function to do that. What you should do, I think, is trying to minimize the differences between the respective PDFs:

Solve[ D[ Integrate[(E^(-x \[Lambda]) \[Lambda] - pdf)^2, {x, 0, \[Infinity]}], \[Lambda] ] == 0, \[Lambda] ]

I believe your pdf function may be too complex for that and I don't really see any problem in the data fitting approach. Only one question: why would you use randomly drawn data? That would only be a proxy for the pdf itself, right? The pdf predicts the bucket values. Simply use that.

answered Sep 13, 2012 at 21:19

Sjoerd C. de Vries

66.4k15 gold badges192 silver badges337 bronze badges

4

$\begingroup$ The $L^2$ norm for the PDFs is not usually an appropriate or useful measure of differences between distributions. Look instead at the $L^1$ norm for the CDFs (the Kolmogorov distance) or the Kullback-Leibler divergence, etc. These issues are extensively discussed in questions at stats.stackexchange.com. $\endgroup$

whuber
– whuber

2012-09-13 22:17:16 +00:00
Commented Sep 13, 2012 at 22:17
1

$\begingroup$ @whuber I don't claim any expertise in this area, so I must assume you're right here. Not that I think it would help, as the K distance is probably as difficult to calculate in this case as the $L^2$ norm. So, the numerical route may be needed anyway. $\endgroup$

Sjoerd C. de Vries
– Sjoerd C. de Vries

2012-09-14 05:51:16 +00:00
Commented Sep 14, 2012 at 5:51
1

$\begingroup$ @whuber In fact, the K distance may be troublesome in this case, because of the strongly non-linear $sup$ function involved. Since finding the minimum difference between the distributions involves differentiating it wrt $\lambda$ that could be a problem. Do you know how to solve that? $\endgroup$

Sjoerd C. de Vries
– Sjoerd C. de Vries

2012-09-14 08:33:22 +00:00
Commented Sep 14, 2012 at 8:33
$\begingroup$ Working with the Kolmogorov distance is not a problem, because we're looking at the differences between the CDFs, which--as integrals of the PDFs--are automatically differentiable (at least for absolutely continuous variables). Not only that, you automatically have the derivative available: it's the PDF! So just find zeros of the differences of the two PDFs. $\endgroup$

whuber
– whuber

2012-09-14 15:45:02 +00:00
Commented Sep 14, 2012 at 15:45
$\begingroup$ @whuber Not sure that's correct. First, shouldn't you differentiate wrt $\lambda$? Well, one of the CDFs isn't even dependent on $\lambda$. And second, differentiating the sup of absolute differences doesn't equal the sup of the absolute differences of the derivatives, does it? $\endgroup$

Sjoerd C. de Vries
– Sjoerd C. de Vries

2012-09-14 22:32:24 +00:00
Commented Sep 14, 2012 at 22:32

| Show 3 more comments

Stack Exchange Network

How to fit one distribution to another?

2 Answers 2

Hot Network Questions

How to fit one distribution to another?

2 Answers 2

Related

Hot Network Questions