1
$\begingroup$

How can I fit the following two set of data to a left skewed gamma function, which I what I think should fit the data best?:

data 1 is here: https://pastebin.com/X2HTjTP7 data 2 is here: https://pastebin.com/8Rh4BHDT

Is there any other suggestion of what would be the best distribution or equation to fit the data?

EDIT: I tried using FindDistributionParameters[data2,GammaDistribution[Alpha,[Beta]]] but I get a message saying "FindDistributionParameters::ntsprt: One or more data points are not in support of the process or distribution GammaDistribution[[Alpha],[Beta]]."

A picture of how data 1 looks is here:

data 1

A picture of how data 2 looks is here:

data 2

Thank you in advanced,

$\endgroup$
12
  • 1
    $\begingroup$ Have you already seen FindDistributionParameters[]? $\endgroup$ Commented May 19, 2020 at 2:36
  • $\begingroup$ @J.M. Yes, I get the following "the value of the GammaDistribution is not a recognized distribution" using FindDistributionParameters[data2, GammaDistribution]. Additionally, I am not sure how to put it to be left skewed Gamma distribution $\endgroup$ Commented May 19, 2020 at 2:48
  • 6
    $\begingroup$ Your data has complex numbers. Distributions only handle real data. $\endgroup$ Commented May 19, 2020 at 3:06
  • 4
    $\begingroup$ You have the completely wrong idea. If you had random samples from a "skewed gamma" distribution, then FindDistributionParameters would be appropriate. But what you have is a curve that has a shape like a reverse gamma distribution and therefore you want to perform a regression with NonlinearModelFit. $\endgroup$ Commented May 19, 2020 at 4:10
  • 1
    $\begingroup$ FYI. The second dataset does not fit a gamma distribution curve as well especially where the peak is located (between 106 to 116 in horizontal units). $\endgroup$ Commented May 19, 2020 at 17:16

1 Answer 1

8
$\begingroup$

You are fitting a curve that has a shape of a known probability distribution and NOT fitting a probability distribution. This is a regression.

After throwing out the complex numbers (as suggested by @BobHanlon) and throwing out the negative response values, one can use NonlinearModelFit. Fitting the log of the curve is more numerically stable when using NonlinearModelFit.

xmax = Max[data[[All, 1]]] + 0.0001; data2 = data; data2[[All, 2]] = Log[data[[All, 2]]]; nlm = NonlinearModelFit[data2, {logc - (xmax - x)/b + a Log[xmax - x], b > 0 && a > 0}, {{a, 0.5}, {b, 2}, {logc, -11}}, x]; nlm["BestFitParameters"] (* {a -> 0.523033, b -> 2.03643, logc -> -11.2393} *) Show[ListPlot[data, PlotRange -> All, Joined -> True, PlotStyle -> {{Yellow, Thickness[0.02]}}], Plot[Exp[nlm[x]], {x, Min[data[[All, 1]]], xmax}, PlotRange -> All, PlotStyle -> Red]] 

Data and fit

$\endgroup$
6
  • $\begingroup$ I am using the data2 removing the negative values and also the complex numbers as you suggested and then I am putting exactly the code you posted but I do not get what you get. The data I am using to be precise is this: pastebin.com/qXMbF0FG . Is there something from the data I did not remove that needs to be removed?. Also, could you tell me if you don't mind why is there a need to use data2 and data? You are using only data2 (the one I have pasted here), correct? or both data at the same time? $\endgroup$ Commented May 19, 2020 at 17:51
  • 1
    $\begingroup$ I think I understand what you did. You only used data but then named it data2. That confused me a little bit but the code works!. Thanks JimB $\endgroup$ Commented May 19, 2020 at 18:50
  • 1
    $\begingroup$ JimB one more question: How can I find the area under the curve?. I am using: area = NIntegrate[Exp[nlm[x]], {x, Min[data[[All, 1]]], Max[data[[All, 1]]]}] but it does not work. $\endgroup$ Commented May 19, 2020 at 22:32
  • 1
    $\begingroup$ Does the error message suggest what might be wrong? $\endgroup$ Commented May 19, 2020 at 22:48
  • $\begingroup$ I get the following: NIntegrate::inumr: The integrand (E^FittedModel[0.00701413 +3.23571*10^-7 Log[<<1>>]-<<2>>^2])[x] has evaluated to non-numerical values for all sampling points in the region with boundaries {{74.0202,148.342}}. $\endgroup$ Commented May 19, 2020 at 22:59

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.