GitHub - jmrichardson/tuneta: Intelligently optimizes technical indicators and optionally selects the least intercorrelated for use in machine learning models

TuneTA optimizes technical indicators using a distance correlation measure to a user defined target feature such as next day return. Indicator parameter(s) are selected using clustering techniques to avoid "peak" or "lucky" values. The set of tuned indicators can be pruned by choosing the most correlated with the target while minimizing correlation with each other (based on user defined maximum correlation). TuneTA maintains its state to add all tuned indicators to multiple data sets (train, validation, test).

Features

Given financial prices (OHLCV) and a target feature such as return, TuneTA optimizes the parameter(s) of technical indicator(s) using distance correlation to the target feature. Distance correlation captures both linear and non-linear strength and provides significant benefit over the popular Pearson correlation.
Optimal indicator parameters are selected in a multi-step clustering process to avoid values which are not consistent with neighboring values, providing a more robust parameter selection.
Prune indicators with a maximum correlation to each other. This is helpful for machine learning models which generally perform better with lower feature intercorrelation.
Supports tuning indicator(s) for single or multiple equities. Multiple equities can be combined into a market basket where indicator parameters are optimized across the entire basket of equities.
Multiple time ranges (ie: short, medium and long)
Supports pruning preexisting features
Persists state to generate identical indicators on multiple datasets (train, validation, test)
Parallel processing for technical indicator optimization as well as correlation pruning
Supports technical indicators produced from the following packages:
- Pandas TA
- TA-Lib
- FinTA
Correlation report of target and features
Early stopping

Overview

TuneTA simplifies the process of optimizing many technical indicators while avoiding "peak" values, and selecting the best indicators with minimal correlation between each other (optional). At a high level, TuneTA performs the following steps:

For each indicator, Optuna searches for parameter(s) which maximize its correlation to a user defined target (for example, next day return).
After the specified Optuna trials are complete, a 3-step KMeans clustering method is used to select the optimal parameter(s):
1. Each trial is placed in its nearest neighbor cluster based on its distance correlation to the target. The optimal number of clusters is determined using the elbow method. The cluster with the highest average correlation is selected with respect to its membership. In other words, a weighted score is used to select the cluster with highest correlation but also with the most trials.
2. After the best correlation cluster is selected, the parameters of the trials within the cluster are also clustered. Again, the best cluster of indicator parameter(s) are selected with respect to its membership.
3. Finally, the centered best trial is selected from the best parameter cluster.
Optionally, the tuned indicators can be pruned by selecting the indicators with a maximum correlation to the all other indicators.
Finally, TuneTA generates all optimized indicators.

Installation

Note: Forcing re-installation of TA-Lib as last step to ensure it's compiled correctly with environment.

pip install -U git+https://github.com/jmrichardson/tuneta pip install --force-reinstall --no-cache-dir --no-deps TA-Lib

Install the latest release:

pip install -U tuneta pip install --force-reinstall --no-cache-dir --no-deps TA-Lib

Install using Colab:

!wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz !tar -xzvf ta-lib-0.4.0-src.tar.gz %cd ta-lib !./configure --prefix=/usr !make !make install !pip install Ta-Lib !pip install -U git+https://github.com/jmrichardson/tuneta !pip install -U git+https://github.com/DistrictDataLabs/yellowbrick.git !pip install numpy==1.20.3 !pip install numba==0.54.1 !pip install pandas==1.3.4 !pip install scikit-learn==1.0.1

Examples

Tune RSI Indicator

For simplicity, lets optimize a single indicator:

RSI Indicator
Two time periods (short and long term): 4-30 and 31-180
Maximum of 100 trials per time period to search for the best indicator parameter
Stop after 20 trials per time period without improvement

The following is a snippet of the complete example found in the examples directory:

tt = TuneTA(n_jobs=4, verbose=True) tt.fit(X_train, y_train, indicators=['tta.RSI'], ranges=[(4, 30), (31, 180)], trials=100, early_stop=20, )

Two studies are created for each time period with up to 100 trials to test different indicator length values. The correlation values are displayed based on the trial parameter. The best trial with its respective parameter value is saved for both time ranges.

To view the correlation of both indicators to the target return as well as each other:

tt.report(target_corr=True, features_corr=True)

Indicator Correlation to Target: Correlation --------------------- ------------- tta_RSI_timeperiod_19 0.23393 tta_RSI_timeperiod_36 0.227434 Indicator Correlation to Each Other: tta_RSI_timeperiod_19 tta_RSI_timeperiod_36 --------------------- ----------------------- ----------------------- tta_RSI_timeperiod_19 0 0.93175 tta_RSI_timeperiod_36 0.93175 0

To generate both RSI indicators on a data set:

features = tt.transform(X_train)

 tta_RSI_timeperiod_19 tta_RSI_timeperiod_36 Date 2011-10-03 NaN NaN 2011-10-04 NaN NaN 2011-10-05 NaN NaN 2011-10-06 NaN NaN 2011-10-07 NaN NaN ... ... ... 2018-09-25 62.173261 60.713051 2018-09-26 59.185666 59.362731 2018-09-27 61.026238 60.210235 2018-09-28 61.094793 60.241806 2018-10-01 63.384824 61.305540

Tune Multiple Indicators

Building from the previous example, lets optimize a handful of indicators:

Basket of indicators from 3 different packages (TA-Lib, Pandas-TA, FinTA)
One time period: 4-60

tt.fit(X_train, y_train, indicators=['pta.slope', 'pta.stoch', 'tta.MACD', 'tta.MOM', 'fta.SMA'], ranges=[(4, 60)], trials=100, early_stop=20, )

You can view how long it took to optimize each indicator:

tt.fit_times()

 Indicator Times -- ----------- ------- 1 pta.stoch 23.56 0 tta.MACD 12.03 2 pta.slope 6.82 4 fta.SMA 6.42 3 tta.MOM 5.7

Let's have a look at each indicator's distance correlation to target as well as each other:

 tt.report(target_corr=True, features_corr=True)

Indicator Correlation to Target: Correlation --------------------------------------------------- ------------- tta_MACD_fastperiod_43_slowperiod_4_signalperiod_52 0.236575 pta_stoch_k_57_d_29_smooth_k_2 0.231091 pta_slope_length_15 0.215603 tta_MOM_timeperiod_15 0.215603 fta_SMA_period_30 0.080596 Indicator Correlation to Each Other: tta_MACD_fastperiod_43_slowperiod_4_signalperiod_52 pta_stoch_k_57_d_29_smooth_k_2 pta_slope_length_15 tta_MOM_timeperiod_15 fta_SMA_period_30 --------------------------------------------------- ----------------------------------------------------- -------------------------------- --------------------- ----------------------- ------------------- tta_MACD_fastperiod_43_slowperiod_4_signalperiod_52 0 0.886265 0.779794 0.779794 0.2209 pta_stoch_k_57_d_29_smooth_k_2 0.886265 0 0.678311 0.678311 0.110129 pta_slope_length_15 0.779794 0.678311 0 1 0.167069 tta_MOM_timeperiod_15 0.779794 0.678311 1 0 0.167069 fta_SMA_period_30 0.2209 0.110129 0.167069 0.167069 0

Notice above that both slope(15) and mom(15) are perfectly correlated in the intercorrelation report (indicated by value of 1) as well as having the same correlation to the target. Initially, I thought this had to be a bug, but they are indeed identically correlated on a different scale (notice the same heat color coding):

Lets remove correlated indicators with a maximum threshold of .85 for demonstration purposes. Based on the above correlation report, the two indicator pairs that have a correlation of greater than .85 are MACD/Stoch and Slope/Mom. We can easily remove the worst correlated to the target of each pair (removes Stoch as MACD is more correlated to the target and either slope or mom can be removed as they are both identically correlated to the target). Notice that all indicators now have an intercorrelation less than .85:

tt.prune(max_inter_correlation=.85)

Indicator Correlation to Target: Correlation --------------------------------------------------- ------------- tta_MACD_fastperiod_43_slowperiod_4_signalperiod_52 0.236576 pta_slope_length_15 0.215603 fta_SMA_period_6 0.099375 Indicator Correlation to Each Other: tta_MACD_fastperiod_43_slowperiod_4_signalperiod_52 pta_slope_length_15 fta_SMA_period_6 --------------------------------------------------- ----------------------------------------------------- --------------------- ------------------ tta_MACD_fastperiod_43_slowperiod_4_signalperiod_52 0 0.779794 0.252834 pta_slope_length_15 0.779794 0 0.188658 fta_SMA_period_6 0.252834 0.188658 0 Backend TkAgg is interactive backend. Turning interactive mode on.

As in the previous example, we can easily create features:

features = tt.transform(X_train)

Tune and Prune all Indicators

Building from the previous examples, lets optimize all available indicators. Note the addition of min_target_correlation which removes indicators below target correlation threshold:

tt.fit(X_train, y_train, indicators=['all'], ranges=[(4, 30)], trials=500, early_stop=100, min_target_correlation=.05, )

As in the previous examples we can see the correlation to the target with the report function:

tt.report(target_corr=True, features_corr=False)

For brevity, only showing the top 10 of the many results:

Indicator Correlation to Target: Correlation -------------------------------------------------------------------------- ------------- pta_natr_length_4_scalar_27 0.253049 tta_NATR_timeperiod_6 0.247999 tta_MACD_fastperiod_3_slowperiod_29_signalperiod_25 0.240217 pta_macd_fast_3_slow_29_signal_25 0.240217 pta_pgo_length_26 0.239584 pta_tsi_fast_28_slow_2_signal_25_scalar_15 0.238303 pta_smi_fast_29_slow_2_signal_20_scalar_26 0.238294 fta_TSI_long_3_short_29_signal_26 0.234654 tta_RSI_timeperiod_19 0.23393 pta_rsi_length_19_scalar_26 0.23393 ...

Let's prune the indicators to have a maximum of .7 correlation with any of the other indicators:

tt.prune(max_inter_correlation=.7)

Show the correlation for both target and intercorrelation after prune:

tt.report(target_corr=True, features_corr=True)

Again, showing only top 10 rows of each for brevity (intercorrelation omitted as well):

 Correlation --------------------------------------------------- ------------- pta_natr_length_4_scalar_27 0.253049 tta_MACD_fastperiod_3_slowperiod_29_signalperiod_25 0.240217 pta_pvol_ 0.199302 pta_kc_length_3_scalar_27 0.193162 fta_VZO_period_20 0.171986 fta_DMI_period_4 0.148614 pta_pvo_fast_27_slow_28_signal_29_scalar_15 0.14692 pta_cfo_length_28_scalar_26 0.141013 fta_IFT_RSI_rsi_period_28_wma_period_4 0.140977 pta_stc_fast_18_slow_27 0.140789 ...

Tune Market

TuneTA supports tuning indicators across a market of equities. Simply, index the input dataframe with the date and symbol similar to the following. Notice the dataframe still contains OHLCV but is indexed by both date and symbol (see tune_market.py in examples folder):

Use TuneTA in the same way as the previous examples

Prune Existing Features

If you have preexisting features in your dataframe (regardless if you use TuneTA to create new ones), I've added a helper prune_df function to prune the all of the features based on intercorrelation. This is helpful, for example, if you have custom features that you would like to combine with TuneTA and select only the features with maximum correlation with minimal intercorrelation. The prune_df helper function takes a dataframe and returns the column names of the appropriate features to keep. The column names can then be used to filter your datasets:

# Features to keep feature_names = tt.prune_df(X_train, y_train, min_target_correlation=.05, max_inter_correlation=.7, report=False) # Filter datasets X_train = X_train[feature_names] X_test = X_test[feature_names]

See prune_dataframe.py in the examples folder

TuneTA fit usage

tt.fit(X, y, indicators, ranges, trials, early_stop)

Parameters:

indicators: List of indicators to optimize
- ['all']: All indicators
- ['pta']: All pandas-ta indicators
- ['tta']: All ta-lib indicators
- ['fta']: All fin-ta indicators
- ['tta.RSI']: RSI indicator from ta-lib
- See config.py for available indicators and the parameters that are optimized
ranges: Time periods to optimize
- [(2-30)]: Single time period (2 to 30 days)
- [(2-30, 31-90)]: Two time periods (short and long term)
trials: Number of trials to search for optimal parameters
early_stop: Max number of trials without improvement
min_target_correlation: Minimum correlation to target required

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
examples		examples
images		images
tuneta		tuneta
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Overview

Installation

Examples

Tune RSI Indicator

Tune Multiple Indicators

Tune and Prune all Indicators

Tune Market

Prune Existing Features

TuneTA fit usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 5

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

Overview

Installation

Examples

Tune RSI Indicator

Tune Multiple Indicators

Tune and Prune all Indicators

Tune Market

Prune Existing Features

TuneTA fit usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 5

Languages

Packages