2
$\begingroup$

I am working on a project to forecast food sales for a corporate restaurant. Sales are heavily influenced by the number of guests per day, along with other factors like seasonality, weather conditions, and special events.

The products sold fall into different categories/groups (e.g., sandwiches, salads, drinks). For now, I am focusing on predicting the total number of products sold per group rather than individual item-level forecasts.

Instead of building a single model to predict sales directly, I am considering a two-phase model approach:

  1. First, train a guest count prediction model (e.g., using time series analysis or regression models). The model will take into account external factors such as weather conditions and vacation periods to improve accuracy.
  2. Use the predicted guest count as an input variable for a product demand prediction model, forecasting the number of products sold per category (e.g., using Random Forest, XGBoost, Prophet or another machine learning model). Additionally, I am exploring stacking or ensembling to combine multiple models and improve prediction accuracy.

My questions:

  1. Is this two-phase approach (predicting guests first, then product demand) a valid and commonly used strategy?
  2. Are there better techniques to model the relationship between guest count and product demand?
  3. Would ensembling or stacking provide significant advantages in this case?
  4. Are there specific models or methodologies that work particularly well for forecasting product demand in grouped categories?

Any insights or suggestions would be greatly appreciated!

$\endgroup$

1 Answer 1

1
$\begingroup$

Your two-phase approach aligns with hierarchical forecasting and causal modeling where demand is driven by an upstream variable such as the guest count here with many similar use cases in application. By explicitly modeling the underlying demand causal driver, one can achieve better generalization and modularized flexibility for different components, such as time series for guests forecast and machine learning for sales, though bear in mind that errors in guest prediction model will cascade into product demand forecasts.

And of course ensemble and stacking can improve performance, but the impact depends on your data's complexity. Ensemble bagging like Random Forest helps reduce variance if the dataset has high fluctuations, while stacking becomes useful if it can capture different aspects of your data in a nonlinear way to reduce bias, for example, one captures trend and another handles special events as implemented by Meta's open-source time series forecasting model Prophet, which is easy to use, handles missing data well, and automatically detects trends, seasonality, and holiday effects.

Alternatively you can simply try a single joint model such as multivariate time series model VAR to forecast guest count and product demand simultaneously, and use it as a baseline to compare your two-phase model. Prophet also include VAR models with multiple seasonality.

Vector autoregression (VAR) is a statistical model used to capture the relationship between multiple quantities as they change over time... Like the autoregressive model, each variable has an equation modelling its evolution over time. This equation includes the variable's lagged (past) values, the lagged values of the other variables in the model, and an error term... The only prior knowledge required is a list of variables which can be hypothesized to affect each other over time.

$\endgroup$
1
  • $\begingroup$ Thanks for the insights! The hierarchical forecasting approach makes a lot of sense, and I hadn’t considered using a multivariate model like VAR as a baseline. I’ll definitely compare my two-phase model with a joint forecasting approach and see how they perform. $\endgroup$ Commented Mar 6 at 12:09

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.