Dr. M.Gethsiyal Augasta Assistant Professor Kamaraj College Thoothukudi – 628 003 Presented on : 17-06-2013 Novel algorithms for Knowledge discovery fromneural networks in Classification problems
Outline Introduction A New Mean wise Discretization and Pattern Selection Method for Classification A New Discretization Algorithm based on Range Coefficient of Dispersion and Skewness for neural networks classifier An Algorithm for Pruning Irrelevant Hidden Neurons of feedforward Neural Network (PINS) A Novel Pruning Algorithm (N2PS) for Optimizing Feedforward Neural Networks Reverse Engineering the Neural Networks for Rule Extraction Conclusion
Overview Classification is one of the data mining problems receiving great attention recently in the data base community. This research has focused on proposing novel algorithms for improving the performance of feedforward neural networks on classification problems. All the algorithms are proposed in three phases using three approaches such as Preprocessing the data, Pruning & retraining and Rule discovery & extraction. Phase I : Two discretization algorithms namely MDC+PS and DRDS have been proposed for preprocessing the data.
Overview (Contd.,) Phase II : Two pruning algorithms namely PIHNS and N2PS have been proposed for optimizing the architecture of neural network. Phase III : A rule extraction algorithm RxREN has been proposed for extracting classification rules of large datasets from the trained neural network. The efficiency of the proposed methods have been proved by implementing them on various real datasets.
Implementation •The proposed algorithms are implemented in JDK1.5. •All experiments were run on a PC with Windows XP operating system, Pentium IV 1.8GHz CPU and 504MB SDRAM memory. The datasets used to test the algorithm are, •The training and testing examples are selected based on 10-fold cross validation method or Random selection method. Properties Datasets iris iono hea pid wav breastw Creditg hepatitis # of classes # of examples # of training examples # of testing examples # of attributes # of continuous attributes 3 150 75 75 4 4 2 351 176 175 34 34 2 270 135 135 13 13 2 768 384 384 8 8 3 5000 2501 2499 40 40 2 699 350 349 9 9 2 1000 550 450 20 7 2 155 81 74 19 6
Preprocessing the Data - DiscretizationDiscretization It transforms continuous attributes values into a finite number of intervals and associates with each interval a numerical discrete value. Why Essential? i. some learning methods do not handle continuous attributes ii. the data transformed in a set of intervals are more cognitively relevant for a human interpretation . Main Goals of Discretization Methods 1. Generating high quality discretization scheme with least number of intervals without any user supervision. 2. The generated discretization scheme should lead to the improvement of accuracy and efficiency of learning algorithm. 3. the discretization process should be as fast as possible.
A New Discretization and Pattern Selection Method For Classification in Data Mining Using Feedforward Neural Networks Published in : International Journal of Advanced Research in Computer Science, 2 (1), Jan. –Feb, 2011, 615- 620. ISSN No. 0976-5697
Phases of Proposed Method – MDC+PS This work consists of two phases,  In the first phase, a new supervised mean wise (MDC) discretization method is proposed to automatically discretize the continuous attributes of large datasets into discrete intervals by the computed mean value . It is aimed at reducing the discretization time and number of intervals.  In the second phase a novel pattern selection mechanism (PS) is proposed to select the most informative training patterns based on pattern disparity in advance of the training phase from the patterns discretized in the first phase.
MDC Discretization Algorithm Input : Consider a dataset with N continuous Attributes, M Patterns and S target classes. Begin 1. For each continuous Attribute 1.1 Initialize the first interval as d0 i.e., values < min1. 1.2 Let the dynamic value t as min1. 1.3 For each target class k. 1.3.1 Find the maximum value maxk, minimum value mink and the mean value Ek. 1.3.2 Assign maxk−1 as t for all k classes where k > 1 and maxk−1 > mink
MDC Discretization Algorithm 1.3.3 Compute the best interval length using lk = |Ek − t| 1.3.4 Compute the number of intervals using n = ( maxk − t ) / ( lk ) 1.3.5 Generate n number of intervals {dki/1≤i≤n}. 1.4 Include additional intervals if mink > maxk−1 to cover all possible values of a continuous attribute for each class k. 1.5 Set the final interval as dm i.e., values > upper bound value of the last interval. 2. The Discretization Scheme (D) for S classes would be D = {d0, dk1, dk2, dk3, ..., dki, ..., dkn, dm} Output: The Discretization Scheme D.
Pattern Selection Method (PS) Pattern selection  It is an active learning strategy to select the most informative patterns for training the network.  It obtains a good training set to increase the performance of a neural network in terms of convergence speed and generalization. Proposed Pattern Selection (PS) method : A data which was discretized into many intervals by MDC is converted into binary code using the Thermometer coding scheme [27]. PS selects all distinct patterns based on pattern disparity for training the feedforward neural network.
Steps of proposed pattern selection method 1. Let P be the set of discretized patterns, A be the number of attributes i and S be the number of target classes k; 2. Compute threshold value η such as If ( A / S ) > S then η= A / S else η= S; 3. Select a pattern pik from P randomly ; R=R+{pik}; P=P-{pik}; 3.1. For each pattern pjk ,j ≠ i of P 3.1.1. Compare pik and pjk and find number of differed bits e; 3.1.2. If e<= η then T=T+{pjk}; P=P-{pjk}; 3.2. end 4. end
Experimental Results The data are classified with feedforward neural network using backpropagation algorithm.
Results Comparisons The comparisons of six datasets results with other six discretization schemes are shown below. Table shows that the generated number of intervals of MDC is comparable with all other discretization algorithms except CAIM. Also the discretization time of MDC is smaller than all other methods for all datasets.
Results Comparisons – Contd. Here the MDC+PS always achieve the highest classification accuracy for all datasets than Equal-w and CAIM discretization method.
MDC+PS– Summary  MDC generates the smallest number of intervals that assumes low computational cost and smaller discretization time  PS method selects the most informative training patterns that leads to the improvement in the classification performance of neural networks.  Simulation results show that MDC+PS achieves significant improvement in classification accuracy in minimum training time for maximum datasets among other six discretization algorithms.  The main drawback of the proposed Meanwise Discretization method (MDC) is that it has to be combined with the proposed PS to achieve the best classification performance. The MDC algorithm is very effective and easy to use supervised discretization algorithm for any classifier if its training data has been selected using the proposed pattern selection (PS) method.
A new Discretization algorithm based on Range coefficient of Dispersion and Skewness for neural networks classifier Published in: Applied Soft Computing, Elsevier Publications. 2012; Vol.12 No. 2; pp:619-625
Proposed Discretization Method (DRDS)  A new static, global, supervised, incremental and bottom-up discretization algorithm based on coefficient of dispersion and skewness of data range.  It automates the discretization process by introducing the number of intervals and stopping criterion. The DRDS method has two phases, Phase I : gets the Initial Discretization scheme (IDS) by searching through globally. Phase II : refines the intervals. Here the intervals are further merged upto the stopping criterion without affecting the quality of the discretization and FDS is obtained.
IDS of DRDS Method • The degree to which numerical data tend to spread is called dispersion. The range coefficient of dispersion is the relative measure of dispersion based on the value of range •When the dispersion is large, the values are widely scattered; when it is small they are tightly clustered. •A value jth minimum value jmink is taken between mink and maxk to get best interval length. • For a data series with large dispersion, smaller j value is selected and for a data series with small dispersion, larger j value is selected. • The value CDk of data of the discretized attribute in the class k is estimated by
IDS of DRDS Method The value of CDk is always in [-1; +1]. To decide the value j, the range [-1; +1] is divided into set of intervals based on the magnitude of number of distinct values in the discretizing attribute of the class k. The value j is selected based on the value of CDk lies in the above interval. The best interval length lk for a discretizing attribute of a class k can be obtained by, A distribution of data is said to be skewed if the given data is symmetrical but stretched more to one side than to the other. The selection of very small jmink value due to the right skewness leads the interval length lk as too small and the number of intervals n also as very high vice versa. This adjustment process of lk can be formulated as,
IDS of DRDS Method -The Selection Process of ‘j’
IDS of DRDS Method Let t be a dynamic variable and it specifies the value from which the discretiza- tion process to be begun for a discretizing attribute of the class k. The number of intervals n for a discretizing attribute of the target class cls(i), i = 1 to S is calculated by The intervals in the Initial Discretization Scheme (IDS) can be written as where dij represents an interval j of a discretizing attribute of the class cls(i).
FDS of DRDS Method • The goal of proposed discretization method is to reduce the number of intervals while maximizing the classification accuracy. • To achieve that the number of intervals in IDS are to be reduced by merging the intervals as follows. • Let b be the number of intervals in IDS and for each interval Ii, i. Calculate the total number of examples qi within the interval Ii. ii. Merge the interval Ii with the adjacent smallest interval until Where i= 2 to b-1, M – total no. of examples
Results of DRDS Discretization : The results obtained by the DRDS algorithm with the six datasets are shown in Table 2. Classification accuracy: computed using the feed forward neural network with conjugate gradient training (MLP-CG) algorithm[21] with the help of KEEL software [25]. Criterion Datasets iris iono heart pid wav breastw Mean Number of Intervals 5.75 5.1 5.0 10.8 12.4 4.0 Discretization time (s) 0.09 0.64 0.31 1.74 35.7 0.15 Criterion Datasets iris iono heart pid wav breast Topology 23-5-3 175-5-2 65-5-2 87-5-2 495-5-3 36-5-2 Learning time (s) 0.18 0.53 0.59 0.54 34.5 0.34 Training Accuracy (%) 97.9 99.3 96.8 80.4 83.1 99.2 Testing Accuracy (%) 96 90.1 80.7 74.0 81.3 95.4
Comparison of Discretization Methods• DRDS is compared with other discretizaion methods such as Equal-w, Equal-F, Chimerge, Ex-chi2, CACC and CAIM. Criterion Discretizat ion Methods Datasets iris iono heart pid wav breastw Mean Number of Intervals Equal-W Equal-F DRDS Chimerge Ex-chi2 CACC CAIM 4.0 4.0 5.75 3.5 7.5 3.0 3.0 20.0 20.0 5.1 21.4 8.8 4.3 2.0 10.0 10.0 5.0 7.8 2.3 6.4 2.0 14.0 14.0 10.8 25.6 20.0 11.2 2.0 20.0 20.0 12.4 28.5 12.2 18.1 3.0 14.0 14.0 4.0 4.6 3.3 2.0 2.0 Discretization Time (s) Equal-w Equal-F DRDS Chimerge Ex-chi2 CACC CAIM 0.02 0.03 0.09 0.09 0.11 0.08 0.08 1.72 1.84 0.64 4.28 11.11 3.62 3.43 0.12 0.12 0.31 0.39 1.68 0.22 0.20 0.33 0.33 1.74 0.94 3.23 0.90 0.80 9.06 9.33 35.7 64.33 136.0 61.41 52.38 0.26 0.27 0.15 0.66 1.91 0.58 0.58
Comparison of Discretization MethodsFigure compares the discretization time of DRDS with only the algorithms which require no parameters. DRDS requires less discretization time due to its low computational cost.
Comparison of Discretization Methods DRDS achieves a high or closer accuracy for all datasets. The accuracies obtained by neural network (MLP-CG) for DRDS are compared with the accuracies obtained for other six discretization schemes on all datasets and it is shown in the following Table. Discretization Methods Datasets iris iono heart pid wav breastw Equal-w Equal-F Chimerge Ex-chi2 DRDS CACC CAIM 96.6 95.3 96.0 93.3 96.0 93.0 94.6 89.7 84.6 89.4 64.1 90.1 90.3 89.5 77.4 73.7 57.8 55.5 80.7 79.3 77.0 74.1 71.9 65.1 72.6 74.9 72.9 72.1 74.3 79.1 78.3 77.4 81.3 80.2 78.1 94.1 95.7 96.3 95.1 95.4 95.1 94.9
DRDS - Summary The proposed the DRDS algorithm handles continuous and mixed mode attributes.  It does not require any user interaction in both phases and performs automatic selection of the number of discrete intervals based on coefficient of dispersion and skewness of data range.  The results show that our DRDS method discretizes an attribute into smallest number of intervals within less amount of time.  The discretization time of DRDS is smaller than the other bottom-up methods for maximum datasets. Also our proposed algorithm DRDS achieves highest classification accuracy among the other six discretization algorithms.
Pruning  Pruning is defined as a network trimming within the assumed initial architecture. The trimmed network is of smaller size and is likely give higher accuracy than before its trimming. Why Pruning? The ANN with large number of hidden nodes able to learn fast but with poor generalization.  The better generalization performance can be achieved only by the small network.  The small trained networks are easier to interpret and the knowledge can be easily extracted in the form of simple rules.
A Novel method for Pruning Irrelevant Hidden Neurons of Feedforward Neural Network Published in : Proceedings of the International conference on Emerging Trends in Mathematics and Computer Applications, MEPCO Schlenk Engineering College, Sivakasi, India. Dec 16-18, 2010. pp. 579-584.
Proposed Method (PIHNS)  Prunes the irrelevant hidden neurons of the single hidden layer Neural Network by sensitivity.  The sensitivity of the global error changes are computed using the Euclidean distance with respect to each individual hidden node after the training process. Named as PIHNS as it Prunes Irrelevant Hidden Neurons by Sensitivity.
PIHNS Algorithm Input: A feedforward neural network with l input neurons, m hidden neurons and n output neurons, and a dataset with np patterns and q attributes. Begin 1.Train the network until a predetermined accuracy rate is achieved using the Backpropagation algorithm with momentum. 2. For each hidden node j, 2.1. Compute the total net value with all the patterns in a dataset using
PIHNS Algorithm – contd. 2.3. Eliminate hidden neuron j if sj≤α, α € {1,2,…n}. 3. Retrain the currently pruned network. 4. If classification rate of the network falls below an acceptable level then stop pruning, otherwise goto step 2. Output: The Pruned multilayer feedforward neural network. 2.2. Compute the sensitivity measure sj for the hidden neuron j by (sj is calculated by finding the squared Euclidian distance between the node hj and weight vjk of its all outgoing connections, where k = (1, 2, …n))
Experimental Results  The datasets used to test the algorithm are Iris, Wisconsin Breast Cancer, Hepatitis domain, Wave form-5000.  pruning parameter α is selected depending on the problem Dataset Initial Architecture Acctest % mse Execution time (s) Final Architecture Acctest % Execution time (s) Pruning parameter α Pruning Steps iris 4-10-3 95.9 0.016 0.17 4-3-3 98.67 0.28 8 2 cancer 9-10-2 96.4 0.01 1.41 9-2-2 97.1 1.93 10 3 hepatit is 19-25-2 78.2 0.08 0.63 19-2-2 83.95 0.76 4 3 wave 40-10-3 80.5 0.03 8.42 40-3-3 84.6 8.81 10 1 Pruned network of iris dataset with the classification accuracy of 98.7% with 4-3-3 architecture.
Hepatitis Pruning Results Step Current Architecture Acctest % Epochs Pruned Neurons 1 19-25-2 78.2 200 18 hidden neurons 2 19-7-2 80.5 50 5 hidden neurons 3 19-2-2 83.95 50 Pruning stops Original network with architecture 19-25-2 with accuracy 78.2% is reduced to the architecture 19-2-2 with accuracy 83.95%.  Requires 0.76 seconds to obtain the pruned network.
Comparison of Pruning methods The proposed method PIHNS is compared with other five pruning methods such as MBP, OBS,OBD,VNP and Xing-Hu’s method. Better architecture with minimum number of hidden nodes . Accuracy is similar or better than other pruning methods. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% iris breast-w hepatitis Datasets ClassificationAccuracy PHINS OBD OBS MBP VNP Xing-Hu PIHNS
Comparing hidden nodes removal with other methods. It shows that the PIHNS method removes more number of hidden neurons for hepatitis and cancer datasets than all other pruning methods. 0 5 10 15 20 25 iris cancer hepatitis Datasets Numberofprunedhiddenneurons N2PS VNP Xing-Hu OBD OBS MBP PIHNS
PIHNS - Summary  Determines the best architecture for feedforward Neural network based on Sensitivity Analysis (SA) using the squared Euclidean distance.  Efficient in identifying irrelevant hidden neurons  Pruned Neural network are more accurate than the original Neural network used in the training phase.  Large decrease in number of hidden nodes with out affecting the classification accuracy which leads to high degree of generalization, and computational time.  It prunes the nodes directly instead of removing unwanted connections associated with those nodes, and hence reduces computational time.
A Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems Published in: Neural Processing Letters, Springer Publications 2011; 34(3):241-258
Proposed N2PS algorithm This work deals with a new approach which determines the insignificant input and hidden neurons to detect the optimum structure of a feedforward neural network. The proposed pruning algorithm, called as Neural Network Pruning by Significance (N2PS), is based on a new significant measure which is calculated by the Sigmoidal activation value of the node and all the weights of its outgoing connections.
Pruning by Significance N2PS considers all the nodes with significance value below the threshold as insignificant and eliminates them.
Steps of N2PS method 1. Train the network T until a predetermined accuracy rate is achieved using the Backpropagation algorithm with momentum. 2. Compute the significance of each hidden neuron using where , and eliminate the neurons if where
Steps of N2PS method (Contd.,) 3. Compute the significance of each input neuron using and eliminate the neurons which are below the threshold value α where 4. Retrain the pruned network and compute its classification accuracy on testing dataset. 5. If classification accuracy of the network P falls below an acceptable level then stop pruning otherwise repeat the process.
Experimental Results The performances of the N2PS algorithm on six datasets are shown in Table . • the algorithm doesn’t require more iteration to prune the network and requires maximum three pruning steps only. • the pruned network achieves higher accuracy than the initially selected network.
Results Comparisons • Classification accuracy of N2PS is compared with other pruning methods such as VNP, Xing-Hu’s method, MBP, OBD and OBS.
Results Comparisons (Contd.,) Comparing hidden nodes removal of N2PS with other five pruning methods Comparing input nodes removal of N2PS with VNP and XingHu’s methods
N2PS Summary • A new pruning algorithm to determine the optimal architecture for feedforward neural network has been proposed based on new significance measure which is estimated using the Sigmoidal function and weights. • Results indicate that the proposed algorithm is very efficient in identifying insignificant input and hidden neurons and also confirm that the pruned neural network yields better accurate results than the original neural network used in the training phase. • The main advantages of this algorithm are, – no user defined parameters needs to be set – large decrease in number of nodes without affecting the classification accuracy – requires small number of pruning steps and requires small number of iterations for retraining the pruned network.
Rule Extraction Why Rule extraction? An important drawback of neural networks is their lack of explanation capability i.e., it is very difficult to understand how an ANN has solved a problem. To overcome this problem various rule extraction algorithms have been developed. Rule extraction : It changes a black box system into a white box system by translating the internal knowledge of a neural network into a set of symbolic rules . It is the process of developing natural language like syntax that describes the behaviour of a neural network
Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems Published in: Neural Processing Letters, Springer Publications, 2012; vol.35 no.2, pp:131-150.
Proposed RxREN algorithm In pedagogical approach the proposed algorithm extracts rules by mapping the input output relationships as closely as possible to the way the neural networks understand the relationship.  Reverse engineering is a method of analyzing a product in which the finished item is studied to determine its makeup or component parts. The algorithm relies on reverse engineering technique since the neural networks are black box i.e., how they solve a problem is not interpretable.. The novelty of this algorithm lies in the simplicity of the extracted rules and conditions in rule are involving both discrete and continuous mode of attributes.
Phases of RxREN algorithmThe proposed algorithm consists of two phases.  The first phase removes the insignificant input neurons from the trained neural network and finds the mandatory data range of each significant input neuron for classifying the given testing data as in particular class. It learns about the importance of each input connection of the trained neural network by analyzing the misclassifications occurred in its absence.  The second phase constructs the classification rules for each class using the data ranges obtained in phase1 and refines the generated rules by the process of rule pruning and rule updation.
Summarized steps of proposed algorithm
Experimental Results Status of neural network at the removal of each neuron for PID dataset.
Various steps of Rule Pruning and Rule Updation of neural network for PID dataset.
Extracted Rules of 6 real datasets.
Performance of RxREN on 6 real datasets. Random 10-fold cross validation
Comparison of Proposed algorithm with various rule extraction algorithms on WBC dataset. • RxREN obtains minimum number of rules with high accuracy.
 A new pedagogical approach rule extraction algorithm RxREN has been proposed to determine the best classification rules from trained neural networks by the technique of reverse engineering.  The RxREN requires minimum time to search the rule since its search space consists only misclassified data.  it doesn't require retraining after pruning.  It extracts the rules with low computational cost but with high accuracy and it extracts more comprehensible set of rules.  It improves the generalization of a rule by the process of rule pruning and it increases the classification accuracy of obtained ruleset by updating them based on the misclassification of the ruleset. RxREN-Summary
Conclusion This research provides novel algorithms for preprocessing the data for classification in datamining, for identifying the optimal architecture of neural networks for generalization and for extracting classification rules of large datasets from neural networks. In MDC+PS method, MDC discretizes the continuous attributes into many intervals by the computed mean value but with nominal accuracy. PS increases the performance of a discretized data on neural network in terms of classification accuracy, convergence speed and generalization by obtaining a good training set based on pattern disparity. The results show that the discretization method MDC has to be combined with PS to achieve the best performance. To overcome this drawback, a new static, global, incremental, supervised and bottom-up discretization method DRDS which is based on coefficient of dispersion and skewness of data range has been proposed. The results obtained using this discretization algorithm show that the discretization scheme generated by the algorithm almost has minimum number of intervals , requires smallest discretization time and leads to highest classification accuracy .
Conclusion (Contd.,) The pruning method PIHNS prunes irrelevant hidden neurons by the sensitivity using the Euclidian distance. The main advantages of this algorithm are large decrease in number of hidden nodes with out affecting the classification accuracy that leads to high degree of generalization, large decrease in computational time for pruning procedure compared with traditional pruning methods. But the main drawbacks of this algorithm are, the irrelevant input neurons can’t be pruned by this algorithm and need the user to specify pruning parameters. N2PS overcomes these drawbacks by pruning both irrelevant input neurons and hidden neurons based on a significance of a node automatically. The main advantages of this algorithm are, no user defined parameters needs to be set, large decrease in number of nodes without affecting the classification accuracy, requires small number of pruning steps and requires small number of iterations for retraining the pruned network compared with other pruning methods and achieves better generalization ability on all datasets. The experimental results demonstrate that the proposed N2PS algorithm is very promising method for determining the optimal architecture of neural networks of arbitrary topology for classifying large datasets.
Conclusion (Contd.,) The rule extraction algorithm RxREN, proposed in this research extracts the rules from neural networks using pedagogical approach. The algorithm relies on reverse engineering technique to prune the insignificant input neurons and to discover the technological principles of each significant input neuron of neural network in classification. The results show that RxREN is quite efficient in extracting smallest set of rules with high classification accuracy than those generated by other neural network rule extraction methods. As a summary the proposed rule extraction algorithm RxREN is very promising method for discovering the knowledge from neural networks and for interpreting the behaviour of neurons in human understandable format from large datasets with mixed mode attributes of data. In a nutshell, various algorithms proposed in this research are very effective and easy to use supervised knowledge discovery algorithms which can be applied to problems that require classification of large datasets.
List of Publications  A New Discretization and Pattern Selection Method For Classification in Data Mining Using Feedforward Neural Networks, International Journal of Advanced Research in Computer Science, 2 (1), Jan. –Feb, 2011, 615-620. ISSN No. 0976-5697.  A Novel Method for pruning Irrelevant Hidden Neurons of Feedforward Neural Network, Proceedings of the International conference on Emerging Trends in Mathematics and Computer Applications, MEPCO Schlenk Engineering College, Sivakasi, India. Dec 16-18, 2010. pp. 579-584  A Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems, Neural Processing Letters, Springer, 2011; 34(3):241-258, Impact Factor : 0.75  Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems, Neural Processing Letters, Springer 2012; 35(2):131-150, Impact Factor : 0.75  A new Discretization algorithm based on Range coefficient of Dispersion and Skewness for neural networks classifier, Applied Soft Computing., Elsevier, 2012; 12(2):619-625 , Impact Factor : 2.61.  M.Gethsiyal Augasta, T.Kathirvalavakumar, Rule extraction from neural networks – A comparative Study, Proceedings of IEEE International conference on Pattern recognition, Informatics and Medical Engineering (IEEE-PRIME 2012), Periyar University, India.
References  Kaikhah.K, Doddmeti S., Discovering trends in large datasets using neural network, Applied Intelligence 29 (2006) 51-60.  Xing H.J., Gang Hu B., Two phase construction of multilayer perceptrons using Information Theory. IEEE Transactions on Neural Networks 20(4) (2009) 715-721.  Castellano G, Fanelli AM, Pelillo M, An iterative pruning algorithm for feedforward neural networks. IEEE Transactions on Neural Networks 8(3) (1997) 519-530.  Han J., Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufman,2001.  Tsai C.J., Lee C.I., Yang W.P., A Discretization algorithm based on Class Attribute Contingency Coefficient, Information Sciences 178 (2008) 714-731.  Saad E.W., Donald C.Wunsch II, Neural network explanation using inversion, Neural networks 20 (2007) 78-93.  Kurgan L.A., Cios K.J., CAIM Discretization Algorithm, IEEE Transactions on knowledge and Data Engineering 16 (2004) 145-152.  Odajima K., Yoichi Hayashi , Gong Tianxia, Rudy Setiono, Greedy rule generation from discrete data and its use in neural network rule extraction, Neural Networks 21 (2008) 1020-1028.
MDC - Example Age : 10, 8, 24, 43, 12, 61,33 Mean : 27 Interval length : 27-8=19 No. of Intervals : (61-8)/19 ≈ 3 Intervals : [8-27][27-46][46-65] Thermometer coding : for age 12 : 100
DRDS - Example Age : 10, 8, 24, 43, 12, 61,33 CD ≈ 0.8 j= 2 [2/3,3/3] jmin=10 Interval length = 2 (10-8) <11 (53/5) Interval length = 4 <11 (53/5) Interval length = 16 No. of Intervals : (61-8)/16 ≈ 4 Intervals : [8-23][24-39][40-55][56-61] After Merging : [8-23][24-61] (if <sqrt(7 ) ≈ 3) Thermometer coding : for age 24 : 01

Novel algorithms for Knowledge discovery from neural networks in Classification problems

  • 1.
    Dr. M.Gethsiyal Augasta AssistantProfessor Kamaraj College Thoothukudi – 628 003 Presented on : 17-06-2013 Novel algorithms for Knowledge discovery fromneural networks in Classification problems
  • 2.
    Outline Introduction A New Meanwise Discretization and Pattern Selection Method for Classification A New Discretization Algorithm based on Range Coefficient of Dispersion and Skewness for neural networks classifier An Algorithm for Pruning Irrelevant Hidden Neurons of feedforward Neural Network (PINS) A Novel Pruning Algorithm (N2PS) for Optimizing Feedforward Neural Networks Reverse Engineering the Neural Networks for Rule Extraction Conclusion
  • 3.
    Overview Classification is oneof the data mining problems receiving great attention recently in the data base community. This research has focused on proposing novel algorithms for improving the performance of feedforward neural networks on classification problems. All the algorithms are proposed in three phases using three approaches such as Preprocessing the data, Pruning & retraining and Rule discovery & extraction. Phase I : Two discretization algorithms namely MDC+PS and DRDS have been proposed for preprocessing the data.
  • 4.
    Overview (Contd.,) Phase II: Two pruning algorithms namely PIHNS and N2PS have been proposed for optimizing the architecture of neural network. Phase III : A rule extraction algorithm RxREN has been proposed for extracting classification rules of large datasets from the trained neural network. The efficiency of the proposed methods have been proved by implementing them on various real datasets.
  • 5.
    Implementation •The proposed algorithmsare implemented in JDK1.5. •All experiments were run on a PC with Windows XP operating system, Pentium IV 1.8GHz CPU and 504MB SDRAM memory. The datasets used to test the algorithm are, •The training and testing examples are selected based on 10-fold cross validation method or Random selection method. Properties Datasets iris iono hea pid wav breastw Creditg hepatitis # of classes # of examples # of training examples # of testing examples # of attributes # of continuous attributes 3 150 75 75 4 4 2 351 176 175 34 34 2 270 135 135 13 13 2 768 384 384 8 8 3 5000 2501 2499 40 40 2 699 350 349 9 9 2 1000 550 450 20 7 2 155 81 74 19 6
  • 6.
    Preprocessing the Data- DiscretizationDiscretization It transforms continuous attributes values into a finite number of intervals and associates with each interval a numerical discrete value. Why Essential? i. some learning methods do not handle continuous attributes ii. the data transformed in a set of intervals are more cognitively relevant for a human interpretation . Main Goals of Discretization Methods 1. Generating high quality discretization scheme with least number of intervals without any user supervision. 2. The generated discretization scheme should lead to the improvement of accuracy and efficiency of learning algorithm. 3. the discretization process should be as fast as possible.
  • 7.
    A New Discretization andPattern Selection Method For Classification in Data Mining Using Feedforward Neural Networks Published in : International Journal of Advanced Research in Computer Science, 2 (1), Jan. –Feb, 2011, 615- 620. ISSN No. 0976-5697
  • 8.
    Phases of ProposedMethod – MDC+PS This work consists of two phases,  In the first phase, a new supervised mean wise (MDC) discretization method is proposed to automatically discretize the continuous attributes of large datasets into discrete intervals by the computed mean value . It is aimed at reducing the discretization time and number of intervals.  In the second phase a novel pattern selection mechanism (PS) is proposed to select the most informative training patterns based on pattern disparity in advance of the training phase from the patterns discretized in the first phase.
  • 9.
    MDC Discretization Algorithm Input: Consider a dataset with N continuous Attributes, M Patterns and S target classes. Begin 1. For each continuous Attribute 1.1 Initialize the first interval as d0 i.e., values < min1. 1.2 Let the dynamic value t as min1. 1.3 For each target class k. 1.3.1 Find the maximum value maxk, minimum value mink and the mean value Ek. 1.3.2 Assign maxk−1 as t for all k classes where k > 1 and maxk−1 > mink
  • 10.
    MDC Discretization Algorithm 1.3.3Compute the best interval length using lk = |Ek − t| 1.3.4 Compute the number of intervals using n = ( maxk − t ) / ( lk ) 1.3.5 Generate n number of intervals {dki/1≤i≤n}. 1.4 Include additional intervals if mink > maxk−1 to cover all possible values of a continuous attribute for each class k. 1.5 Set the final interval as dm i.e., values > upper bound value of the last interval. 2. The Discretization Scheme (D) for S classes would be D = {d0, dk1, dk2, dk3, ..., dki, ..., dkn, dm} Output: The Discretization Scheme D.
  • 11.
    Pattern Selection Method(PS) Pattern selection  It is an active learning strategy to select the most informative patterns for training the network.  It obtains a good training set to increase the performance of a neural network in terms of convergence speed and generalization. Proposed Pattern Selection (PS) method : A data which was discretized into many intervals by MDC is converted into binary code using the Thermometer coding scheme [27]. PS selects all distinct patterns based on pattern disparity for training the feedforward neural network.
  • 12.
    Steps of proposedpattern selection method 1. Let P be the set of discretized patterns, A be the number of attributes i and S be the number of target classes k; 2. Compute threshold value η such as If ( A / S ) > S then η= A / S else η= S; 3. Select a pattern pik from P randomly ; R=R+{pik}; P=P-{pik}; 3.1. For each pattern pjk ,j ≠ i of P 3.1.1. Compare pik and pjk and find number of differed bits e; 3.1.2. If e<= η then T=T+{pjk}; P=P-{pjk}; 3.2. end 4. end
  • 13.
    Experimental Results The dataare classified with feedforward neural network using backpropagation algorithm.
  • 14.
    Results Comparisons The comparisonsof six datasets results with other six discretization schemes are shown below. Table shows that the generated number of intervals of MDC is comparable with all other discretization algorithms except CAIM. Also the discretization time of MDC is smaller than all other methods for all datasets.
  • 15.
    Results Comparisons –Contd. Here the MDC+PS always achieve the highest classification accuracy for all datasets than Equal-w and CAIM discretization method.
  • 16.
    MDC+PS– Summary  MDCgenerates the smallest number of intervals that assumes low computational cost and smaller discretization time  PS method selects the most informative training patterns that leads to the improvement in the classification performance of neural networks.  Simulation results show that MDC+PS achieves significant improvement in classification accuracy in minimum training time for maximum datasets among other six discretization algorithms.  The main drawback of the proposed Meanwise Discretization method (MDC) is that it has to be combined with the proposed PS to achieve the best classification performance. The MDC algorithm is very effective and easy to use supervised discretization algorithm for any classifier if its training data has been selected using the proposed pattern selection (PS) method.
  • 17.
    A new Discretizationalgorithm based on Range coefficient of Dispersion and Skewness for neural networks classifier Published in: Applied Soft Computing, Elsevier Publications. 2012; Vol.12 No. 2; pp:619-625
  • 18.
    Proposed Discretization Method(DRDS)  A new static, global, supervised, incremental and bottom-up discretization algorithm based on coefficient of dispersion and skewness of data range.  It automates the discretization process by introducing the number of intervals and stopping criterion. The DRDS method has two phases, Phase I : gets the Initial Discretization scheme (IDS) by searching through globally. Phase II : refines the intervals. Here the intervals are further merged upto the stopping criterion without affecting the quality of the discretization and FDS is obtained.
  • 19.
    IDS of DRDSMethod • The degree to which numerical data tend to spread is called dispersion. The range coefficient of dispersion is the relative measure of dispersion based on the value of range •When the dispersion is large, the values are widely scattered; when it is small they are tightly clustered. •A value jth minimum value jmink is taken between mink and maxk to get best interval length. • For a data series with large dispersion, smaller j value is selected and for a data series with small dispersion, larger j value is selected. • The value CDk of data of the discretized attribute in the class k is estimated by
  • 20.
    IDS of DRDSMethod The value of CDk is always in [-1; +1]. To decide the value j, the range [-1; +1] is divided into set of intervals based on the magnitude of number of distinct values in the discretizing attribute of the class k. The value j is selected based on the value of CDk lies in the above interval. The best interval length lk for a discretizing attribute of a class k can be obtained by, A distribution of data is said to be skewed if the given data is symmetrical but stretched more to one side than to the other. The selection of very small jmink value due to the right skewness leads the interval length lk as too small and the number of intervals n also as very high vice versa. This adjustment process of lk can be formulated as,
  • 21.
    IDS of DRDSMethod -The Selection Process of ‘j’
  • 22.
    IDS of DRDSMethod Let t be a dynamic variable and it specifies the value from which the discretiza- tion process to be begun for a discretizing attribute of the class k. The number of intervals n for a discretizing attribute of the target class cls(i), i = 1 to S is calculated by The intervals in the Initial Discretization Scheme (IDS) can be written as where dij represents an interval j of a discretizing attribute of the class cls(i).
  • 23.
    FDS of DRDSMethod • The goal of proposed discretization method is to reduce the number of intervals while maximizing the classification accuracy. • To achieve that the number of intervals in IDS are to be reduced by merging the intervals as follows. • Let b be the number of intervals in IDS and for each interval Ii, i. Calculate the total number of examples qi within the interval Ii. ii. Merge the interval Ii with the adjacent smallest interval until Where i= 2 to b-1, M – total no. of examples
  • 24.
    Results of DRDS Discretization: The results obtained by the DRDS algorithm with the six datasets are shown in Table 2. Classification accuracy: computed using the feed forward neural network with conjugate gradient training (MLP-CG) algorithm[21] with the help of KEEL software [25]. Criterion Datasets iris iono heart pid wav breastw Mean Number of Intervals 5.75 5.1 5.0 10.8 12.4 4.0 Discretization time (s) 0.09 0.64 0.31 1.74 35.7 0.15 Criterion Datasets iris iono heart pid wav breast Topology 23-5-3 175-5-2 65-5-2 87-5-2 495-5-3 36-5-2 Learning time (s) 0.18 0.53 0.59 0.54 34.5 0.34 Training Accuracy (%) 97.9 99.3 96.8 80.4 83.1 99.2 Testing Accuracy (%) 96 90.1 80.7 74.0 81.3 95.4
  • 25.
    Comparison of Discretization Methods•DRDS is compared with other discretizaion methods such as Equal-w, Equal-F, Chimerge, Ex-chi2, CACC and CAIM. Criterion Discretizat ion Methods Datasets iris iono heart pid wav breastw Mean Number of Intervals Equal-W Equal-F DRDS Chimerge Ex-chi2 CACC CAIM 4.0 4.0 5.75 3.5 7.5 3.0 3.0 20.0 20.0 5.1 21.4 8.8 4.3 2.0 10.0 10.0 5.0 7.8 2.3 6.4 2.0 14.0 14.0 10.8 25.6 20.0 11.2 2.0 20.0 20.0 12.4 28.5 12.2 18.1 3.0 14.0 14.0 4.0 4.6 3.3 2.0 2.0 Discretization Time (s) Equal-w Equal-F DRDS Chimerge Ex-chi2 CACC CAIM 0.02 0.03 0.09 0.09 0.11 0.08 0.08 1.72 1.84 0.64 4.28 11.11 3.62 3.43 0.12 0.12 0.31 0.39 1.68 0.22 0.20 0.33 0.33 1.74 0.94 3.23 0.90 0.80 9.06 9.33 35.7 64.33 136.0 61.41 52.38 0.26 0.27 0.15 0.66 1.91 0.58 0.58
  • 26.
    Comparison of Discretization MethodsFigurecompares the discretization time of DRDS with only the algorithms which require no parameters. DRDS requires less discretization time due to its low computational cost.
  • 27.
    Comparison of Discretization Methods DRDSachieves a high or closer accuracy for all datasets. The accuracies obtained by neural network (MLP-CG) for DRDS are compared with the accuracies obtained for other six discretization schemes on all datasets and it is shown in the following Table. Discretization Methods Datasets iris iono heart pid wav breastw Equal-w Equal-F Chimerge Ex-chi2 DRDS CACC CAIM 96.6 95.3 96.0 93.3 96.0 93.0 94.6 89.7 84.6 89.4 64.1 90.1 90.3 89.5 77.4 73.7 57.8 55.5 80.7 79.3 77.0 74.1 71.9 65.1 72.6 74.9 72.9 72.1 74.3 79.1 78.3 77.4 81.3 80.2 78.1 94.1 95.7 96.3 95.1 95.4 95.1 94.9
  • 28.
    DRDS - Summary Theproposed the DRDS algorithm handles continuous and mixed mode attributes.  It does not require any user interaction in both phases and performs automatic selection of the number of discrete intervals based on coefficient of dispersion and skewness of data range.  The results show that our DRDS method discretizes an attribute into smallest number of intervals within less amount of time.  The discretization time of DRDS is smaller than the other bottom-up methods for maximum datasets. Also our proposed algorithm DRDS achieves highest classification accuracy among the other six discretization algorithms.
  • 29.
    Pruning  Pruning isdefined as a network trimming within the assumed initial architecture. The trimmed network is of smaller size and is likely give higher accuracy than before its trimming. Why Pruning? The ANN with large number of hidden nodes able to learn fast but with poor generalization.  The better generalization performance can be achieved only by the small network.  The small trained networks are easier to interpret and the knowledge can be easily extracted in the form of simple rules.
  • 30.
    A Novel methodfor Pruning Irrelevant Hidden Neurons of Feedforward Neural Network Published in : Proceedings of the International conference on Emerging Trends in Mathematics and Computer Applications, MEPCO Schlenk Engineering College, Sivakasi, India. Dec 16-18, 2010. pp. 579-584.
  • 31.
    Proposed Method (PIHNS) Prunes the irrelevant hidden neurons of the single hidden layer Neural Network by sensitivity.  The sensitivity of the global error changes are computed using the Euclidean distance with respect to each individual hidden node after the training process. Named as PIHNS as it Prunes Irrelevant Hidden Neurons by Sensitivity.
  • 32.
    PIHNS Algorithm Input: A feedforwardneural network with l input neurons, m hidden neurons and n output neurons, and a dataset with np patterns and q attributes. Begin 1.Train the network until a predetermined accuracy rate is achieved using the Backpropagation algorithm with momentum. 2. For each hidden node j, 2.1. Compute the total net value with all the patterns in a dataset using
  • 33.
    PIHNS Algorithm –contd. 2.3. Eliminate hidden neuron j if sj≤α, α € {1,2,…n}. 3. Retrain the currently pruned network. 4. If classification rate of the network falls below an acceptable level then stop pruning, otherwise goto step 2. Output: The Pruned multilayer feedforward neural network. 2.2. Compute the sensitivity measure sj for the hidden neuron j by (sj is calculated by finding the squared Euclidian distance between the node hj and weight vjk of its all outgoing connections, where k = (1, 2, …n))
  • 34.
    Experimental Results  Thedatasets used to test the algorithm are Iris, Wisconsin Breast Cancer, Hepatitis domain, Wave form-5000.  pruning parameter α is selected depending on the problem Dataset Initial Architecture Acctest % mse Execution time (s) Final Architecture Acctest % Execution time (s) Pruning parameter α Pruning Steps iris 4-10-3 95.9 0.016 0.17 4-3-3 98.67 0.28 8 2 cancer 9-10-2 96.4 0.01 1.41 9-2-2 97.1 1.93 10 3 hepatit is 19-25-2 78.2 0.08 0.63 19-2-2 83.95 0.76 4 3 wave 40-10-3 80.5 0.03 8.42 40-3-3 84.6 8.81 10 1 Pruned network of iris dataset with the classification accuracy of 98.7% with 4-3-3 architecture.
  • 35.
    Hepatitis Pruning Results StepCurrent Architecture Acctest % Epochs Pruned Neurons 1 19-25-2 78.2 200 18 hidden neurons 2 19-7-2 80.5 50 5 hidden neurons 3 19-2-2 83.95 50 Pruning stops Original network with architecture 19-25-2 with accuracy 78.2% is reduced to the architecture 19-2-2 with accuracy 83.95%.  Requires 0.76 seconds to obtain the pruned network.
  • 36.
    Comparison of Pruningmethods The proposed method PIHNS is compared with other five pruning methods such as MBP, OBS,OBD,VNP and Xing-Hu’s method. Better architecture with minimum number of hidden nodes . Accuracy is similar or better than other pruning methods. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% iris breast-w hepatitis Datasets ClassificationAccuracy PHINS OBD OBS MBP VNP Xing-Hu PIHNS
  • 37.
    Comparing hidden nodesremoval with other methods. It shows that the PIHNS method removes more number of hidden neurons for hepatitis and cancer datasets than all other pruning methods. 0 5 10 15 20 25 iris cancer hepatitis Datasets Numberofprunedhiddenneurons N2PS VNP Xing-Hu OBD OBS MBP PIHNS
  • 38.
    PIHNS - Summary Determines the best architecture for feedforward Neural network based on Sensitivity Analysis (SA) using the squared Euclidean distance.  Efficient in identifying irrelevant hidden neurons  Pruned Neural network are more accurate than the original Neural network used in the training phase.  Large decrease in number of hidden nodes with out affecting the classification accuracy which leads to high degree of generalization, and computational time.  It prunes the nodes directly instead of removing unwanted connections associated with those nodes, and hence reduces computational time.
  • 39.
    A Novel PruningAlgorithm for Optimizing Feedforward Neural Network of Classification Problems Published in: Neural Processing Letters, Springer Publications 2011; 34(3):241-258
  • 40.
    Proposed N2PS algorithm Thiswork deals with a new approach which determines the insignificant input and hidden neurons to detect the optimum structure of a feedforward neural network. The proposed pruning algorithm, called as Neural Network Pruning by Significance (N2PS), is based on a new significant measure which is calculated by the Sigmoidal activation value of the node and all the weights of its outgoing connections.
  • 41.
    Pruning by Significance N2PSconsiders all the nodes with significance value below the threshold as insignificant and eliminates them.
  • 42.
    Steps of N2PSmethod 1. Train the network T until a predetermined accuracy rate is achieved using the Backpropagation algorithm with momentum. 2. Compute the significance of each hidden neuron using where , and eliminate the neurons if where
  • 43.
    Steps of N2PSmethod (Contd.,) 3. Compute the significance of each input neuron using and eliminate the neurons which are below the threshold value α where 4. Retrain the pruned network and compute its classification accuracy on testing dataset. 5. If classification accuracy of the network P falls below an acceptable level then stop pruning otherwise repeat the process.
  • 44.
    Experimental Results The performancesof the N2PS algorithm on six datasets are shown in Table . • the algorithm doesn’t require more iteration to prune the network and requires maximum three pruning steps only. • the pruned network achieves higher accuracy than the initially selected network.
  • 45.
    Results Comparisons • Classificationaccuracy of N2PS is compared with other pruning methods such as VNP, Xing-Hu’s method, MBP, OBD and OBS.
  • 46.
    Results Comparisons (Contd.,) Comparinghidden nodes removal of N2PS with other five pruning methods Comparing input nodes removal of N2PS with VNP and XingHu’s methods
  • 47.
    N2PS Summary • Anew pruning algorithm to determine the optimal architecture for feedforward neural network has been proposed based on new significance measure which is estimated using the Sigmoidal function and weights. • Results indicate that the proposed algorithm is very efficient in identifying insignificant input and hidden neurons and also confirm that the pruned neural network yields better accurate results than the original neural network used in the training phase. • The main advantages of this algorithm are, – no user defined parameters needs to be set – large decrease in number of nodes without affecting the classification accuracy – requires small number of pruning steps and requires small number of iterations for retraining the pruned network.
  • 48.
    Rule Extraction Why Ruleextraction? An important drawback of neural networks is their lack of explanation capability i.e., it is very difficult to understand how an ANN has solved a problem. To overcome this problem various rule extraction algorithms have been developed. Rule extraction : It changes a black box system into a white box system by translating the internal knowledge of a neural network into a set of symbolic rules . It is the process of developing natural language like syntax that describes the behaviour of a neural network
  • 49.
    Reverse Engineering the NeuralNetworks for Rule Extraction in Classification Problems Published in: Neural Processing Letters, Springer Publications, 2012; vol.35 no.2, pp:131-150.
  • 50.
    Proposed RxREN algorithm Inpedagogical approach the proposed algorithm extracts rules by mapping the input output relationships as closely as possible to the way the neural networks understand the relationship.  Reverse engineering is a method of analyzing a product in which the finished item is studied to determine its makeup or component parts. The algorithm relies on reverse engineering technique since the neural networks are black box i.e., how they solve a problem is not interpretable.. The novelty of this algorithm lies in the simplicity of the extracted rules and conditions in rule are involving both discrete and continuous mode of attributes.
  • 51.
    Phases of RxREN algorithmTheproposed algorithm consists of two phases.  The first phase removes the insignificant input neurons from the trained neural network and finds the mandatory data range of each significant input neuron for classifying the given testing data as in particular class. It learns about the importance of each input connection of the trained neural network by analyzing the misclassifications occurred in its absence.  The second phase constructs the classification rules for each class using the data ranges obtained in phase1 and refines the generated rules by the process of rule pruning and rule updation.
  • 52.
    Summarized steps ofproposed algorithm
  • 53.
    Experimental Results Status ofneural network at the removal of each neuron for PID dataset.
  • 54.
    Various steps ofRule Pruning and Rule Updation of neural network for PID dataset.
  • 55.
    Extracted Rules of6 real datasets.
  • 56.
    Performance of RxRENon 6 real datasets. Random 10-fold cross validation
  • 57.
    Comparison of Proposedalgorithm with various rule extraction algorithms on WBC dataset. • RxREN obtains minimum number of rules with high accuracy.
  • 58.
     A newpedagogical approach rule extraction algorithm RxREN has been proposed to determine the best classification rules from trained neural networks by the technique of reverse engineering.  The RxREN requires minimum time to search the rule since its search space consists only misclassified data.  it doesn't require retraining after pruning.  It extracts the rules with low computational cost but with high accuracy and it extracts more comprehensible set of rules.  It improves the generalization of a rule by the process of rule pruning and it increases the classification accuracy of obtained ruleset by updating them based on the misclassification of the ruleset. RxREN-Summary
  • 59.
    Conclusion This research providesnovel algorithms for preprocessing the data for classification in datamining, for identifying the optimal architecture of neural networks for generalization and for extracting classification rules of large datasets from neural networks. In MDC+PS method, MDC discretizes the continuous attributes into many intervals by the computed mean value but with nominal accuracy. PS increases the performance of a discretized data on neural network in terms of classification accuracy, convergence speed and generalization by obtaining a good training set based on pattern disparity. The results show that the discretization method MDC has to be combined with PS to achieve the best performance. To overcome this drawback, a new static, global, incremental, supervised and bottom-up discretization method DRDS which is based on coefficient of dispersion and skewness of data range has been proposed. The results obtained using this discretization algorithm show that the discretization scheme generated by the algorithm almost has minimum number of intervals , requires smallest discretization time and leads to highest classification accuracy .
  • 60.
    Conclusion (Contd.,) The pruningmethod PIHNS prunes irrelevant hidden neurons by the sensitivity using the Euclidian distance. The main advantages of this algorithm are large decrease in number of hidden nodes with out affecting the classification accuracy that leads to high degree of generalization, large decrease in computational time for pruning procedure compared with traditional pruning methods. But the main drawbacks of this algorithm are, the irrelevant input neurons can’t be pruned by this algorithm and need the user to specify pruning parameters. N2PS overcomes these drawbacks by pruning both irrelevant input neurons and hidden neurons based on a significance of a node automatically. The main advantages of this algorithm are, no user defined parameters needs to be set, large decrease in number of nodes without affecting the classification accuracy, requires small number of pruning steps and requires small number of iterations for retraining the pruned network compared with other pruning methods and achieves better generalization ability on all datasets. The experimental results demonstrate that the proposed N2PS algorithm is very promising method for determining the optimal architecture of neural networks of arbitrary topology for classifying large datasets.
  • 61.
    Conclusion (Contd.,) The ruleextraction algorithm RxREN, proposed in this research extracts the rules from neural networks using pedagogical approach. The algorithm relies on reverse engineering technique to prune the insignificant input neurons and to discover the technological principles of each significant input neuron of neural network in classification. The results show that RxREN is quite efficient in extracting smallest set of rules with high classification accuracy than those generated by other neural network rule extraction methods. As a summary the proposed rule extraction algorithm RxREN is very promising method for discovering the knowledge from neural networks and for interpreting the behaviour of neurons in human understandable format from large datasets with mixed mode attributes of data. In a nutshell, various algorithms proposed in this research are very effective and easy to use supervised knowledge discovery algorithms which can be applied to problems that require classification of large datasets.
  • 62.
    List of Publications A New Discretization and Pattern Selection Method For Classification in Data Mining Using Feedforward Neural Networks, International Journal of Advanced Research in Computer Science, 2 (1), Jan. –Feb, 2011, 615-620. ISSN No. 0976-5697.  A Novel Method for pruning Irrelevant Hidden Neurons of Feedforward Neural Network, Proceedings of the International conference on Emerging Trends in Mathematics and Computer Applications, MEPCO Schlenk Engineering College, Sivakasi, India. Dec 16-18, 2010. pp. 579-584  A Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems, Neural Processing Letters, Springer, 2011; 34(3):241-258, Impact Factor : 0.75  Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems, Neural Processing Letters, Springer 2012; 35(2):131-150, Impact Factor : 0.75  A new Discretization algorithm based on Range coefficient of Dispersion and Skewness for neural networks classifier, Applied Soft Computing., Elsevier, 2012; 12(2):619-625 , Impact Factor : 2.61.  M.Gethsiyal Augasta, T.Kathirvalavakumar, Rule extraction from neural networks – A comparative Study, Proceedings of IEEE International conference on Pattern recognition, Informatics and Medical Engineering (IEEE-PRIME 2012), Periyar University, India.
  • 63.
    References  Kaikhah.K, DoddmetiS., Discovering trends in large datasets using neural network, Applied Intelligence 29 (2006) 51-60.  Xing H.J., Gang Hu B., Two phase construction of multilayer perceptrons using Information Theory. IEEE Transactions on Neural Networks 20(4) (2009) 715-721.  Castellano G, Fanelli AM, Pelillo M, An iterative pruning algorithm for feedforward neural networks. IEEE Transactions on Neural Networks 8(3) (1997) 519-530.  Han J., Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufman,2001.  Tsai C.J., Lee C.I., Yang W.P., A Discretization algorithm based on Class Attribute Contingency Coefficient, Information Sciences 178 (2008) 714-731.  Saad E.W., Donald C.Wunsch II, Neural network explanation using inversion, Neural networks 20 (2007) 78-93.  Kurgan L.A., Cios K.J., CAIM Discretization Algorithm, IEEE Transactions on knowledge and Data Engineering 16 (2004) 145-152.  Odajima K., Yoichi Hayashi , Gong Tianxia, Rudy Setiono, Greedy rule generation from discrete data and its use in neural network rule extraction, Neural Networks 21 (2008) 1020-1028.
  • 65.
    MDC - Example Age: 10, 8, 24, 43, 12, 61,33 Mean : 27 Interval length : 27-8=19 No. of Intervals : (61-8)/19 ≈ 3 Intervals : [8-27][27-46][46-65] Thermometer coding : for age 12 : 100
  • 66.
    DRDS - Example Age: 10, 8, 24, 43, 12, 61,33 CD ≈ 0.8 j= 2 [2/3,3/3] jmin=10 Interval length = 2 (10-8) <11 (53/5) Interval length = 4 <11 (53/5) Interval length = 16 No. of Intervals : (61-8)/16 ≈ 4 Intervals : [8-23][24-39][40-55][56-61] After Merging : [8-23][24-61] (if <sqrt(7 ) ≈ 3) Thermometer coding : for age 24 : 01