2

I currently have a set of data points (hit counts), which are structured as a time series. The data is something like:

time hits 20 200 32 439 57 512 

How can I fit a curve to this data or find a formula so that I can predict points in the future? Ideally, I can answer a question like "How many views will there be when the time is 100?"

Thanks for your help!

EDIT: What I've tried so far:

I've tried a variety of methods, including:

  1. Creating a Logistic Regression using sklearn (however, there are no features for the data)

  2. Creating a curve fit using optimize.curve_fit from scipy (however, I don't have a function for the data)

  3. Creating a function from a UnivariateSpline to pass into curve_fit (something went wrong, I can't pin it down)

I'm trying to model when content goes viral, so I assume that a polynomial or exponential curve is ideal.

I tried the links from @Bill previously, but I have no function for the data. Do you know how I can find one?

EDIT 2:

Here's a sample of about two days of data: The Fox Data

Here is what is expected over time.

6
  • 1
    A few questions: 1.) What have you tried so far? 2.) What kind of curve are you trying to fit - polynomial? exponential? loglinear? 3.) Have you looked at any documentation or related questions on this site, such as this or this? Commented May 12, 2014 at 16:23
  • Thanks for the comment, @Bill. I've edited the post to include what I've tried so far. Commented May 12, 2014 at 16:31
  • 1
    Without relevant domain knowledge, it would be difficult to tell what model (logistic, linear, ...) to use to fit the data with. Commented May 12, 2014 at 17:50
  • 1
    In light of your edits, the real question is: how do I know what kind of curve fits my data? And the answer is, well it varies for all datasets. Your best bet is to try a bunch and see which is the best fit for your data. However, you're not just trying to fit your data, you're using your data to "train" a model which you can use to predict future values. Model training and validation is a huge field, and you're not going to get an easy answer to "which curve fits my data well and additionally predicts data well." Commented May 12, 2014 at 17:51
  • however, if you post a plot of hits as a function of time we can tell you if there is an obvious answer. Commented May 12, 2014 at 17:53

1 Answer 1

1

As other people have said it is difficult to give an answer with so few information.

I suggest you to define some new variable like time, time*time, time*time*time and to fit a LinearRegression model using this as input variable.

I will start with these and then in case using something of more complex like neural network (not in sklearn) or SVR.

Hope this can help.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.