Backtested Dynamic Confidence Intervals

This notebook demonstrates how to create an expanding confidence interval using conformal intervals and backtesting. It expands what is demonstrated in the Confidence Intervals Notebook. Requires scalecast>=0.18.1.

We overwrite the static naive intervals produced by scalecast by default with dynamic expanding intervals obtained from backtesting.

See the article.

[1]:

import pandas as pd import numpy as np from scalecast.Forecaster import Forecaster from scalecast.util import ( metrics, backtest_metrics, backtest_for_resid_matrix, get_backtest_resid_matrix, overwrite_forecast_intervals, ) from scalecast.Pipeline import Pipeline, Transformer, Reverter import pandas_datareader as pdr import matplotlib.pyplot as plt import seaborn as sns import time 

[2]:

val_len = 24 fcst_len = 24 

Link to data: https://fred.stlouisfed.org/series/HOUSTNSA

[3]:

housing = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01') housing.head() 

[3]:

	HOUSTNSA
DATE
1959-01-01	96.2
1959-02-01	99.0
1959-03-01	127.7
1959-04-01	150.8
1959-05-01	152.5

Hold out a test set since scalecast uses its native test set to determine interval widths, therefore overfitting the interval to the test set.

[4]:

starts_sep = housing.iloc[-fcst_len:,0] starts = housing.iloc[:-fcst_len,0] 

[5]:

f = Forecaster( y=starts, current_dates=starts.index, future_dates=fcst_len, test_length=val_len, validation_length=val_len, cis=True, ) 

Step 1: Build and fit_predict() Pipeline

[6]:

transformer = Transformer(['DiffTransform']) reverter = Reverter(['DiffRevert'],transformer) 

[7]:

def forecaster(f): f.add_ar_terms(100) f.add_seasonal_regressors('month') f.set_estimator('xgboost') f.manual_forecast() 

[8]:

pipeline = Pipeline( steps = [ ('Transform',transformer), ('Forecast',forecaster), ('Revert',reverter) ] ) 

[9]:

f = pipeline.fit_predict(f) 

[10]:

f.plot_test_set(); 

../../_images/misc_cis-bt_cis-bt_13_0.png

Score the default Interval

[11]:

f.plot(ci=True); 

../../_images/misc_cis-bt_cis-bt_15_0.png

[12]:

fig, ax = plt.subplots(figsize=(12,6)) f.plot(ci=True,models='top_1',order_by='TestSetRMSE',ax=ax) sns.lineplot( y = 'HOUSTNSA', x = 'DATE', data = starts_sep.reset_index(), ax = ax, label = 'held-out actuals', color = 'green', alpha = 0.7, ) plt.xlim(pd.Timestamp('2000-01-01'),pd.Timestamp('2021-12-01')) plt.title('Forecast with Naive Interval') plt.show() 

../../_images/misc_cis-bt_cis-bt_16_0.png

[13]:

print( 'All confidence intervals for every step' ' are {:.2f} units away from the point predictions.'.format( f.history['xgboost']['UpperCI'][0] - f.history['xgboost']['Forecast'][0] ) ) print( 'The interval contains {:.2%} of the actual values'.format( np.sum( [ 1 if a <= uf and a >= lf else 0 for a, uf, lf in zip( starts_sep, f.history['xgboost']['UpperCI'], f.history['xgboost']['LowerCI'] ) ] ) / len(starts_sep) ) ) 

 All confidence intervals for every step are 37.01 units away from the point predictions. The interval contains 100.00% of the actual values

Score default interval

[14]:

metrics.msis( a = starts_sep, uf = f.history['xgboost']['UpperCI'], lf = f.history['xgboost']['LowerCI'], obs = f.y, m = 12, ) 

[14]:

 4.026554265078705

Step 2: Backtest Pipeline

Iterations need to be at least 20 for 95% intervals.
Length of each prediction in the backtest should match our desired forecast length.

[15]:

%%time backtest_results = backtest_for_resid_matrix( f, pipeline=pipeline, alpha = .05, # default jump_back = 1, # default ) 

 CPU times: total: 1min 40s Wall time: 28.5 s

Step 3: Build Residual Matrix

Result is matrix shaped 20x24, each row a backtest iteration, each column a forecast step, each value a residual.

[16]:

backtest_resid_matrix = get_backtest_resid_matrix(backtest_results) 

Residual Analytics

[17]:

pd.options.display.max_columns = None fig, ax = plt.subplots(figsize=(16,8)) mat = pd.DataFrame(np.abs(backtest_resid_matrix[0]['xgboost'])) sns.heatmap( mat.round(1), annot = True, ax = ax, cmap = sns.color_palette("icefire", as_cmap=True) ) plt.ylabel('Backtest Iteration',size=16) plt.xlabel('Forecast Step',size = 16) plt.title('Absolute Residuals from XGBoost Backtest',size=25) plt.show() 

../../_images/misc_cis-bt_cis-bt_25_0.png

[18]:

fig, ax = plt.subplots(1,2,figsize=(16,8)) sns.heatmap( pd.DataFrame({'Mean Residuals':mat.mean().round(1)}), annot = True, cmap = 'cubehelix_r', ax = ax[0], annot_kws={"fontsize": 16}, ) cbar = ax[0].collections[0].colorbar cbar.ax.invert_yaxis() ax[0].set_title('Mean Absolute Residuals',size=20) ax[0].set_ylabel('Forecast Step',size=15) ax[0].set_xlabel('') sns.heatmap( pd.DataFrame({'Residuals 95 Percentile':np.percentile(mat, q=95, axis = 0)}), annot = True, cmap = 'cubehelix_r', ax = ax[1], annot_kws={"fontsize": 16}, ) cbar = ax[1].collections[0].colorbar cbar.ax.invert_yaxis() ax[1].set_title('Absolute Residual 95 Percentiles',size=20) ax[1].set_ylabel('Forecast Step',size=15) ax[1].set_xlabel('') plt.show() 

../../_images/misc_cis-bt_cis-bt_26_0.png

Each step of the forecast will be plus/minus the 95 percentile of the absolute residuals (plot on right).

Step 4: Overwrite Naive Interval with Dynamic Interval

[19]:

overwrite_forecast_intervals(f,backtest_resid_matrix=backtest_resid_matrix) 

[20]:

f.plot(ci=True); 

../../_images/misc_cis-bt_cis-bt_30_0.png

[21]:

fig, ax = plt.subplots(figsize=(12,6)) f.plot(ci=True,models='top_1',order_by='TestSetRMSE',ax=ax) sns.lineplot( y = 'HOUSTNSA', x = 'DATE', data = starts_sep.reset_index(), ax = ax, label = 'held-out actuals', color = 'green', alpha = 0.7, ) plt.xlim(pd.Timestamp('2000-01-01'),pd.Timestamp('2021-12-01')) plt.title('Forecast with Dynamic Interval') plt.show() 

../../_images/misc_cis-bt_cis-bt_31_0.png

Score dynamic interval

[22]:

metrics.msis( a = starts_sep, uf = f.history['xgboost']['UpperCI'], lf = f.history['xgboost']['LowerCI'], obs = f.y, m = 12, ) 

[22]:

 3.919628517087574

It is a small improvement, but still an improvement!

[23]:

print( 'The intervals are on average {:.2f} units away from the point predictions.'.format( np.mean(np.percentile(mat, q=95, axis = 0)) ) ) print( 'The interval contains {:.2%} of the actual values'.format( np.sum( [ 1 if a <= uf and a >= lf else 0 for a, uf, lf in zip( starts_sep, f.history['xgboost']['UpperCI'], f.history['xgboost']['LowerCI'] ) ] ) / len(starts_sep) ) ) 

 The intervals are on average 27.36 units away from the point predictions. The interval contains 87.50% of the actual values

Other Backtest Uses

We can also use the backtest results to report average error metrics over 20 out-of-sample sets.

[24]:

backtest_metrics(backtest_results,mets=['rmse','mae','bias'])[['Average']] 

[24]:

		Average
Model	Metric
xgboost	rmse	14.083597
	mae	12.144780
	bias	239.430706

[25]:

# actual rmse on out-of-sample data metrics.rmse(starts_sep,f.history['xgboost']['Forecast']) 

[25]:

 16.208569138706174

[ ]: