I have a dataset with quarter wise revenue for past 3 years from Jan 2020 to Dec 2022. I have 4642 customers.
Each customer has 1 row of data which includes features based on his purchase frequency, avg revenue, total revenue, min revenue and max revenue etc for each quarter.
So, from Jan 2020 to Dec 2022, we have 12 quarters and data for all those quarters.
So, now our objective is to predict the revenue for 2023Q1.
As most of our customers have zero revenue, I use Zero Inflated regressor(ZIR).
So, I built a model with training set (2020Q1 to 2022Q3) and validation set (2022Q4).
But the training was done after some feature selection which resulted in selection of important features such as Avg_revenue_2021_Q4, AVG_REVENUE_2022_Q2_trans_rate_2022_Q1, Total_revenue_2022Q3
So, now that model is built and validated, I get a R2 of 89% and 87% in train and test set respectively and decent MAE scores.
But my question, how do I predict the revenue for 2023Q1? 2023Q1 hasn't happened yet (and we need to predict in advance for the same). So, I don't have data for 2023Q1
So, should I build my final model by combining my train (2020Q1 to 2022Q3) and validation set (2022Q4)?
Once I build this final model, what is the input that I should pass to get predicted values for 2023Q1? there is no test set because 2023Q1 hasn't happened yet.
So, now to predict 2023Q1 revenue (for each of the 4642 customers), should I again pass the values of Avg_revenue_2021_Q4, AVG_REVENUE_2022_Q2_trans_rate_2022_Q1, Total_revenue_2022Q3 ? If I pass the same values again, what is the difference between prediction for 2022Q4 (test set) and 2023Q1 (inference)
So, now I am stuck on what should I pass as input to predict for 2023Q1.
Can you help me please?