I'm currently working on a stock market prediction project using Geometric Brownian Motion (GBM). My goal is to showcase that GBM, as a stochastic model, does not require a large dataset compared to models like LSTM or ARIMA. This makes GBM suitable for applications involving new stocks, indices, or assets like cryptocurrencies, where historical data is limited.
Given the stochastic nature of the model, I'm trying to decide the best way to evaluate its performance on stock price prediction. I have some thoughts and questions I'd like input on:
1. Rolling Window vs. Train-Test Split
- Should I evaluate GBM predictions using a rolling window approach or a traditional train-test split (e.g., 80-20)?
- My concern with train-test split is that it assumes stationarity in the data, which might not hold in financial time series. Rolling windows, on the other hand, allow the model to adapt to changing data, but I am unsure if the results are directly comparable to a train-test split.
2. Evaluation of Rolling Window Results
- How should I evaluate the results from a rolling window? Should I calculate metrics (e.g., RMSE, MAE) across all prediction points and compare them to a single train-test split metric?
3. Minimal Data Requirement
- Are there any studies or research articles suggesting that GBM performs well with less data?
- What is the minimum amount of data recommended to estimate ( \mu ) and ( \sigma ) reliably? I hypothesize that it depends on the time scale of the data (daily, weekly, etc.), but I haven't found specific guidance.
4. My Current Thoughts
- I believe that rolling windows might be better suited for time-varying systems like financial markets, as they allow the model to adapt to new patterns. However, I'm concerned that rolling windows might introduce more noise due to smaller training sets for each window.
- For minimal data, I think GBM might work well with as little as 1 month of daily data (~20 points) for short-term predictions. However, this is just a hypothesis, and I’d like confirmation or suggestions from experts.
Any insights, references to relevant papers, or suggestions for best practices would be greatly appreciated!