0
$\begingroup$

I have a dataset of chromatic and monochromatic galaxy fluxes which looks like inverted V shape as follows:

import numpy as np import pandas as pd import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [10, 8] plt.style.use('fivethirtyeight') %matplotlib inline df = pd.DataFrame({'flux': [7953.371, 9045.019, 10052.73, 11174.9, 12468.98, 13973.44, 15765.74, 17721.66, 19667.59, 21671.05, 23839.87, 26193.85, 28826.92, 31833.04, 35277.45, 39350.62, 44119.84, 49781.21, 56399.18, 64513.93, 75960.82, 92307.63, 116218.9, 148417.9, 197342.0, 280455.2, 368527.9, 480157.0, 672086.2, 1253551.0], 'gc': [0.23848699999999998, 0.197701, 0.18892799999999998, 0.180912, 0.18326900000000002, 0.186448, 0.22009, 0.26702, 0.310159, 0.36151500000000003, 0.408945, 0.435596, 0.43973100000000004, 0.44741800000000004, 0.445114, 0.434199, 0.42946899999999993, 0.39339, 0.373621, 0.364333, 0.34575300000000003, 0.320202, 0.272527, 0.21973800000000002, 0.18654300000000001, 0.131162, 0.062214, 0.049783999999999995, 0.047236, 0.059118], 'gm': [0.23848699999999998, 0.197701, 0.18892799999999998, 0.180912, 0.18326900000000002, 0.186448, 0.22009, 0.26702, 0.310159, 0.36151500000000003, 0.408945, 0.435596, 0.43973100000000004, 0.44741800000000004, 0.445114, 0.434199, 0.42946899999999993, 0.39339, 0.373621, 0.364333, 0.34575300000000003, 0.320202, 0.272527, 0.21973800000000002, 0.18654300000000001, 0.131162, 0.062214, 0.049783999999999995, 0.047236, 0.059118]}) print(df.head()) flux gc gm 0 7953.371 0.238487 0.238487 1 9045.019 0.197701 0.197701 2 10052.730 0.188928 0.188928 3 11174.900 0.180912 0.180912 4 12468.980 0.183269 0.183269 

The data looks like this: enter image description here

Now, I want to fit a model to the data and find the weights. I have used numpy polyfit of degree 3.

# my attempt x = df['flux'].values y = df['gm'].values degree = 3 z = np.polyfit(x,y,degree) p = np.poly1d(z) xp = np.linspace(x.min(), x.max(), 1000) plt.plot(x, y, '.', xp, p(xp), '-') 

This gives not a good fit. enter image description here

I am looking for suggestion to fit the using python (statsmodels, scikit-learn) or R using any model, I just need the parameters of that model.

Ideas and suggestions are very welcome!

$\endgroup$

2 Answers 2

2
$\begingroup$

After plotting out the density of the dependent varaible, flux, I would recommend exploring a Generalized Linear Model.

flux appears to be the output of a gamma distribution or perhaps a log-normal distribution. I would recommend exploring a GLM, especially with the lme4 package in R. GLMs are convenient transformations of the prototypical linear regression, and thus are easily interpreted. They are flexible and well-founded in theory as well.

Is there any work which might suggest a data generating process for your problem? You might ask yourself things you know about the process: is it true that x values are always greater than 0? Is there an upper bound?

$\endgroup$
1
$\begingroup$

This looks very much like a task for a generalized additive model (GAM) with regression smoothing splines. Essentially, this are a series (or ensemble) of linear regressions, stacked upon each other along the x-axis. See „Introduction to Statistical Learning“ (Ch. 7). GAM often work just as well as non-parametric regression. However, they are easy to understand and fast to implement. You can model extreme non-linearity with no effort.

Here is the Python code to the book: https://github.com/JWarmenhoven/ISLR-python

$\endgroup$
2
  • $\begingroup$ Thanks a lot. I will definitely take a look at this. $\endgroup$ Commented Jun 25, 2019 at 20:55
  • $\begingroup$ Alternatively, if you want a truly parametric model, just go on adding polynomials, e.g. in R with the poly function. Easy to do in a loop. However, I guess GAM would be better. Cheers! $\endgroup$ Commented Jun 25, 2019 at 21:23

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.