Using support vector machine with a hybrid feature selection method to the stock trend prediction

Using support vector machine with a hybrid feature selection method to the stock trend prediction Ming-Chi Lee Expert Systems with Applications . 2009 Presenter: Yu Hsiang Huang Date: 2012-05-17 1

Outline • Introduction • Feature selection • Research design • Experimental results and analysis • Conclusion 2

Introduction • Stock market – Highly nonlinear dynamic system • Application of AI – Expert system , Fuzzy system, Neuron network – Back propagation neural network (BPNN) • Power of prediction is better than the others • Require a large amount of training data to estimate the distribution of input pattern • Over-fitting nature • Fully depends on researcher’s experience of knowledge to preprocess data – relevant input variables, hidden layer size, learning rate, momentum, etc. 3

Introduction • In this paper – Support vector machine (SVM) • Captures geometric characteristics of feature space without deriving weights of networks from the training data. • Extracts the optimal solution with the small training set size • Local optimal solution vs. Global optimum solution • No over-fitting • Classification performance is influenced by dimension or number of feature variables – Feature selection • Addresses the dimensionality reduction problem by determining a subset of available features which is most essential for classification • Hybrid feature selection : Filter method + wrapper method  F_SSFS • F_SSFS : F-score + Supported sequential forward search • Optimal parameter search – Compare performance between BP and SVM 4

SVM-based model with F_SSFS Original feature variables Hybrid feature selection Filter part Feature pruning using F-score Pre-selected features Wrapper part SSFS algorithm find best feature variables Best Feature variables Data SVM Training , testing , evaluating the classification accuracy 5

Feature selection • Filter method : – No feed back from classifier – Estimate the classification performance by some indirect assessments • Distance : reflect how well the classes separate from each other No feedback from classifier Estimate the classification performance : distance 6

Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – F-score Original feature variables Calculate F-score Sort F-score Select top K F-score feature 8

SVM-based model with F_SSFS Original feature variables Hybrid feature selection Filter part Feature pruning using F-score Pre-selected features Wrapper part SSFS algorithm find best feature variables Best Feature variables Data SVM Training , testing , evaluating the classification accuracy 9

Feature selection • Wrapper method: – Classifier-dependent • Evaluate the “goodness” of the selected feature subset directly (from classifier) • Should intuitively yield better performance – Have limit applications • Due to the high computational complexity involved Feedback from classifier 10

Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – Supported sequential forward search (SSFS) • Play the role of wrapper • A variation of the sequential forward search (SFS) algorithm that is specially tailored to SVM to expedite the feature searching process • Support vector : training samples other than support vectors have no contribution to determine the decision boundary • Dynamically maintains an active subset as the candidates of the support vector • Training SVM using reduced subset rather than the entire training set - less computational cost 11

Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – Supported sequential forward search (SSFS) f1 f2 f3 f4 … fk-2 fk-1 fk label r1 … … … … … … … … + r2 … … … … … … … … - … … … … … … … … … - rN … … … … … … … … + 12

Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – Supported sequential forward search (SSFS) Iteration = 1 Iteration = n+1 Termination 13

Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – F_SSFS • Uses the F-score measure to decide the best feature subsets • Uses the SSFS algorithm to select the final best feature subsets • Reduces the number of features that has to be tested through the training of SVM • Reduces the unnecessary computation time spent on the testing of the “no-informative” features by wrapper method 14

Research design • Data collection and preprocessing – Prediction target : the direction of change in the daily NASDAQ index – Index futures lead the spot index – Using 30 technical indices as the whole features set – 20 future contracts, 9 spot indexes and 1-day lagged NASDAQ Index – Use “1” and “-1” to denote the next day’s index is higher or lower than today’s – From Nov 8, 2001 to Nov 8, 2007 with 1065 observations per feature – The original data are scaled into the range of (0,1) f1 f2 f3 … … f28 f29 f30 label 1 … … … … … … … … 1 2 … … … … … … … … -1 … … … … … … … … … -1 1065 … … … … … … … … 1 15

SVM-based model with F_SSFS Original feature variables Hybrid feature selection Filter part Feature pruning using F-score Pre-selected K features Wrapper part SSFS algorithm find best feature variables Best Feature variables Data SVM Training , testing , evaluating the classification accuracy 18

Experimental results and analysis • Experimental result of F_SSFS – Threshold K determines how many features we want to keep after filtering. • K is equal to the number of all original features  filter part does not contribute at all • K is equal to 1  the wrapper method is unnecessary 19

Experimental results and analysis 20

Experimental results and analysis • Experimental result of F_SSFS – wrapper part – Choose K = 22, after the process of wrapper part – 17 features are left, average accuracy rate 81.7% 21

Experimental results and analysis 22

Experimental results and analysis • Experimental result of SVM • Experimental result of BPNN 23

Experimental results and analysis • Experimental result of feature selection – Key deficiency of neural-network models for stock trend prediction • Difficulty in selecting the discriminative features and explaining the rationale for the stock trend prediction – Relative importance of each feature 24

Experimental results and analysis • Conclusion – Stock trend prediction – Support vector machine with hybrid feature selection method (F_SSFS) – Reducing high computational cost and the risk of over-fitting – Need to investigate to develop the optimal value of the parameters in SVM for the best prediction performance – Generalization of SVM on the basis of the appropriate level of the training set size and give a guideline to measure the generalization performance 25

Using support vector machine with a hybrid feature selection method to the stock trend prediction

More Related Content

What's hot

Viewers also liked

Similar to Using support vector machine with a hybrid feature selection method to the stock trend prediction

Recently uploaded

Using support vector machine with a hybrid feature selection method to the stock trend prediction