5
$\begingroup$

I have a custom distribution created to model some experimental observations. While too complicated to include in this question, I can provide an example and some illustrations to convey a sense of it.

Take the following as representing the PDF of the distribution:

pdf = (0.334336 (E^(2.56822 (-4.1816 - Log[x])) Erfc[ 3.54409*10^7 (-4.1816 - Log[x])] + E^(0.904055 (4.1816 + Log[x])) Erfc[ 3.54409*10^7 (4.1816 + Log[x])]))/x; 

Which one can readily plot:

Plot[pdf, {x, 0, 0.05}, PlotRange -> {All, {0, 100}}] 

PDF[dist]

Defining the distribution, dist as :

dist = ProbabilityDistribution[pdf, {x, 0, ∞}] 

I can generate RandomVariates of dist and plot them in a Histogram:

Histogram[RandomVariate[dist, 10^5], {0, 0.05, 0.0005}, "PDF", ImageSize -> 300, PlotRange -> {All, {0, 100}}] 

histogram

You can see that the histogram and plot of the PDF look pretty similar. So far so good.

I have conjectured that if one takes sufficient observations the data will eventually converge to an exponential distribution. So, I thought to estimate an exponential distribution from (in this case) generating a bunch of random data using the original distrubtion.

Plot[{ PDF[EstimatedDistribution[RandomVariate[dist, 10^4], ExponentialDistribution[λ]], x], pdf}, {x, 0, 0.05}, ImageSize -> 300, PlotRange -> {All, {0, 100}}] 

estimated dist converging

I like this, BUT this leads to my first question:

Where do all those squiggles in the PDF of the exponential distribution come from?

Take a closer look:

squiggle zoom

Does this seem normal? Shouldn't I get a smooth curve? Hopefully someone will have an insight, but this got me wondering and led me to a second and maybe more interesting question:

Does Mathematica have a way to directly fit one distribution to another without (in the above case) the intermediate step of generating a set of random variates first?

$\endgroup$
6
  • $\begingroup$ Since dist is defined easily, it would've been better to do that instead of noting everywhere that the code won't run... it looks much better now :) $\endgroup$ Commented Sep 13, 2012 at 21:01
  • $\begingroup$ @R.M -- Of course! I should have thought of that. Just late in the day. Thx. $\endgroup$ Commented Sep 13, 2012 at 21:23
  • 3
    $\begingroup$ Note that this approach does not solve your stated problem. If you conjecture that the true underlying distribution is exponential, then you ought to consider something like a Maximum Likelihood estimate of the exponential parameter (based on the data you do have) followed by goodness-of-fit testing. $\endgroup$ Commented Sep 13, 2012 at 22:21
  • $\begingroup$ @whuber -- Thank you for the recommendation. $\endgroup$ Commented Sep 14, 2012 at 1:52
  • $\begingroup$ Doesn't one typically do this by minimizing the KL-Divergence? I guess those intergrals would be too hard to do and the random variates approach is best. $\endgroup$ Commented May 25, 2015 at 15:57

2 Answers 2

11
$\begingroup$

The problem is because you're evaluating an EstimatedDistribution for each plot point! A different set of random data are generated for each point, based on which the fit is recalculated. This is why you get the wiggles, because each time, you get slightly different values for the parameter $\lambda$.

Try the following instead:

With[{fit = EstimatedDistribution[RandomVariate[dist, 10^4], ExponentialDistribution[λ]]}, Plot[{PDF[fit, x], PDF[dist, x]}, {x, 0, 0.05}] ] 

fit

$\endgroup$
2
  • 1
    $\begingroup$ Yeah, I was about to post that an Evaluateshould be thrown in. $\endgroup$ Commented Sep 13, 2012 at 21:09
  • $\begingroup$ That helps a lot. Thanks. $\endgroup$ Commented Sep 14, 2012 at 1:53
2
$\begingroup$

As to your main question, fitting distribution with distributions: AFAIK there is no built-in function to do that. What you should do, I think, is trying to minimize the differences between the respective PDFs:

Solve[ D[ Integrate[(E^(-x \[Lambda]) \[Lambda] - pdf)^2, {x, 0, \[Infinity]}], \[Lambda] ] == 0, \[Lambda] ] 

I believe your pdf function may be too complex for that and I don't really see any problem in the data fitting approach. Only one question: why would you use randomly drawn data? That would only be a proxy for the pdf itself, right? The pdf predicts the bucket values. Simply use that.

$\endgroup$
8
  • 4
    $\begingroup$ The $L^2$ norm for the PDFs is not usually an appropriate or useful measure of differences between distributions. Look instead at the $L^1$ norm for the CDFs (the Kolmogorov distance) or the Kullback-Leibler divergence, etc. These issues are extensively discussed in questions at stats.stackexchange.com. $\endgroup$ Commented Sep 13, 2012 at 22:17
  • 1
    $\begingroup$ @whuber I don't claim any expertise in this area, so I must assume you're right here. Not that I think it would help, as the K distance is probably as difficult to calculate in this case as the $L^2$ norm. So, the numerical route may be needed anyway. $\endgroup$ Commented Sep 14, 2012 at 5:51
  • 1
    $\begingroup$ @whuber In fact, the K distance may be troublesome in this case, because of the strongly non-linear $sup$ function involved. Since finding the minimum difference between the distributions involves differentiating it wrt $\lambda$ that could be a problem. Do you know how to solve that? $\endgroup$ Commented Sep 14, 2012 at 8:33
  • $\begingroup$ Working with the Kolmogorov distance is not a problem, because we're looking at the differences between the CDFs, which--as integrals of the PDFs--are automatically differentiable (at least for absolutely continuous variables). Not only that, you automatically have the derivative available: it's the PDF! So just find zeros of the differences of the two PDFs. $\endgroup$ Commented Sep 14, 2012 at 15:45
  • $\begingroup$ @whuber Not sure that's correct. First, shouldn't you differentiate wrt $\lambda$? Well, one of the CDFs isn't even dependent on $\lambda$. And second, differentiating the sup of absolute differences doesn't equal the sup of the absolute differences of the derivatives, does it? $\endgroup$ Commented Sep 14, 2012 at 22:32

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.