revenue forecast using regression - what is the input for future?

Question

I have a dataset with quarter wise revenue for past 3 years from Jan 2020 to Dec 2022. I have 4642 customers.

Each customer has 1 row of data which includes features based on his purchase frequency, avg revenue, total revenue, min revenue and max revenue etc for each quarter.

So, from Jan 2020 to Dec 2022, we have 12 quarters and data for all those quarters.

So, now our objective is to predict the revenue for 2023Q1.

As most of our customers have zero revenue, I use Zero Inflated regressor(ZIR).

So, I built a model with training set (2020Q1 to 2022Q3) and validation set (2022Q4).

But the training was done after some feature selection which resulted in selection of important features such as Avg_revenue_2021_Q4, AVG_REVENUE_2022_Q2_trans_rate_2022_Q1, Total_revenue_2022Q3

So, now that model is built and validated, I get a R2 of 89% and 87% in train and test set respectively and decent MAE scores.

But my question, how do I predict the revenue for 2023Q1? 2023Q1 hasn't happened yet (and we need to predict in advance for the same). So, I don't have data for 2023Q1

So, should I build my final model by combining my train (2020Q1 to 2022Q3) and validation set (2022Q4)?

Once I build this final model, what is the input that I should pass to get predicted values for 2023Q1? there is no test set because 2023Q1 hasn't happened yet.

So, now to predict 2023Q1 revenue (for each of the 4642 customers), should I again pass the values of Avg_revenue_2021_Q4, AVG_REVENUE_2022_Q2_trans_rate_2022_Q1, Total_revenue_2022Q3 ? If I pass the same values again, what is the difference between prediction for 2022Q4 (test set) and 2023Q1 (inference)

So, now I am stuck on what should I pass as input to predict for 2023Q1.

Can you help me please?

Have you read this article? timeseriesreasoning.com/contents/… — Nicolas Martin
– Nicolas Martin, Commented Jan 3, 2023 at 18:11
Yes, I have read this article. I read your deleted answer. So, you mean something like this - I can make my training from q1 to q10, validate it on q11 and finally test it on q12 (which will give me q13). Am I right? — The Great
– The Great, Commented Jan 4, 2023 at 0:08
@TheGreat I saw your comment in the prior question. This is a tough situation, and for me, it is a forecasting problem. The hard part is not predicting that next target value, it is that all of your variables in the regression used to make your target prediction are from the same period Q1-2023, which you wont have. While it does not address the zero inflation a VAR vector auto regression model accommodates this shortcoming. It generates ALL of the predictors for the next time period based on prior periods and uses them to make a complete regression for the unknown period — sconfluentus
– sconfluentus, Commented Jan 4, 2023 at 22:28

Erwan · Accepted Answer · 2023-01-03 21:28:12Z

I would try to formally design the model without any reference to any specific time period, in order to avoid the problem that you mention. The idea is to make the target period relative to the features period, whatever this period is.

For example, you could consider that the model is trained with 2 years of data and predicts the next month. This way for every customer you build a training set like this:

features target Q1 .. Q8 Q9 Q2 .. Q9 Q10

(I'm using Q1=2020Q1 .. Q5=2021Q1 .. Q12=2022Q4)

and as test set (for evaluation) you could have instances like this:

features target Q3 .. Q10 Q11 Q4 .. Q11 Q12

After evaluating, the final model could be trained with the whole data, like so:

features target Q1 .. Q8 Q9 Q2 .. Q9 Q10 Q3 .. Q10 Q11 Q4 .. Q11 Q12

And finally when you are ready to apply the final model in order to predict future data, i.e. Q13, you build the test set instances like this:

features target Q5 .. Q12 ?

Of course you could also consider different settings, like only one year of features so that you have more instances as training data.

Hi erwan. Thanks. So, you are suggesting me to build 2 models for different time periods like you have shown? — The Great
– The Great, Commented Jan 3, 2023 at 23:31
Or put it simply, I can make my training from q1 to q10, validate it on q11 and finally test it on q12 (which will give me q13). Am I right? — The Great
– The Great, Commented Jan 4, 2023 at 0:08
@TheGreat Maybe I'm not clear: the idea is that none of the models is specific to a given time. It happens that it's built with the data available from a certain time, but in theory it's meant to predict the next month for any set of past data as features, That's why you can for example have instances for Q1..Q8 as well as Q2..Q9 in the same model, and potentially any other for which the data is available. — Erwan
– Erwan, Commented Jan 4, 2023 at 13:24
No I'm not on these platforms and I'm not interested either, but I'm flattered that you think I deserve it, thanks :) — Erwan
– Erwan, Commented Jan 4, 2023 at 13:26

Stack Exchange Network

revenue forecast using regression - what is the input for future?

1 Answer 1

Hot Network Questions

revenue forecast using regression - what is the input for future?

1 Answer 1

Related

Hot Network Questions