Predicting students performance using classification techniques in data mining

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC) ISSN-2455-099X, Volume 2, Issue 10 October 2016 IJTC201610001 www. ijtc.org 489 Predicting Students’ Performance Using Classification Techniques in Data Mining Mukesh Kumar1 , Prof (Dr.) A. J. Singh 1 (PhD Scholar, CS Department, HPU-Shimla HP, mukesh.kumarphd2014@gmail.com) 2 (Professor, CS Department, HPU-Shimla HP, aj_singh@yahoo.uk.in) Abstract: Role of education is very critical for the development of any country. So it is the responsibility of each and every person to do something for the betterment of education. Taking this fact into consideration we start working on the education system. Education system ranging from basic to higher education. Now a day education system generates a lots of data related to student. If we cannot analyze that data properly then that data is useless. With the help of data mining techniques we can find the hidden information from the data collected for the different educational setting. With the help of that information we can review our educational process or make improvement in our education system. Here in this article we are considering a case of an engineering college student and try to predict the final result in advance. The result of the prediction provides timely help to those students who are on risk of failure in the final examination. There are different techniques of data mining are available and we are using J48, RandomForest, and ADTree to predict the performance of the student in their final examination. On the basis of this predication we can make a decision whether the student will be promoted to next year or not. We the help of the result we can improve the performance of the student who are on risk of fail or promoted. After the declaration of the final result of the student, result is fed into the system and hence the result will analysed for the next semester. The comparative result shows that, prediction help in the improvement of overall result of the weaker students. Keywords- Data Mining, EDM, Decision Tree Algorithm, J48, RandomForest, ADTree. I. INTRODUCTION Data mining is a one of the most important field to study. Data mining concepts, techniques and algorithms are applied into different fields like education, medicine, business, retail management, hospital and hospitality industries etc. With the help of data mining techniques we can predict the future of any business or make improvement in it. We are learning about different data mining techniques in our study like association rule mining, clustering, classification etc. There are two types of data mining techniques are available like supervised and unsupervised learning. In supervised learning we are making model first and then apply algorithm on that data set [2]. While in unsupervised learning we are applying algorithm first and then make model for analysis. Now we are just discussing about the concept of educational data mining. As we already mentioned that we are choosing educational field because education is one of the most important facture for the development of the nation. Data mining is used to find hidden information for the data set. So by analysing the educational data we want to find some important information which is helpful for the further improvement in education. Which data mining algorithms are applied on dataset is depend upon the types of dataset and what you want to find form it. We have studied different algorithm which are applied on the different data set. Data mining algorithm like neural network, Naïve Bayes, K- Nearest neighbour, Decision tree, classification and clustering are applied on the educational dataset [3]. With the help of data mining techniques we can predict, classify or cluster student according to their performance in their education. Examination marks play most important role in the life of a student. If we can predict the result of the student before examination then we can put some extra effort to improve the performance of that student in their final examination. You can say with the help of predication we can provide timely help to the student who are at risk of education failure. II. PROBLEM RELATED TO THE HIGHER EDUCATION SYSTEM At present most of the institutions or organisation in India are facing the problem of student admission. Most of the engineering college or university are face problem of low admission in engineering stream. There are lot of reason for that like less placement record, less infrastructures; syllabus not updated, less qualified staff, poor teaching methodology. So to increase admission in the college we need to provide these basic needs of the time. Without providing these features no college will sustain in the near future and face the problem of failure [1]. So to remain in the competition with other college they need to provide extra to the student which helps them a lot in their study. Educational data mining is the solution of the entire problem because with the help of educational data mining we can analysis the all the data which are produced by the educational setting. With the help of analysis we can predict the result of the student, dropout of any student, placement of the student, behaviour of the student etc. If any student having a risk of failure and we can predict that risk in advance then we can provide timely help to that student. Education data mining techniques can be applied on any types of educational data. There are lots of data mining techniques which are applied on educational data like classification and clustering algorithm. In this article we can consider the case of an engineering college in which we want to predict the result of the student in their final examination of next semester. For that purpose we can collect the information of the first year student with different attribute like branch, sex, category, father occupation, Mother occupation etc [1]. With the help of data mining we can predict the result of the student in advance and then provide the student timely help who are IJTC.O RG

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC) ISSN-2455-099X, Volume 2, Issue 10 October 2016 IJTC201610001 www. ijtc.org 490 on the risk of failure. The motive behind this article is to help different educational institutional administrator by creating a model which provide some helps to student and hence they will improve their result in future. We are taking different steps to achieve these motives in mind are listed below: 1. Choose the different source by which you can collect the information related to the student with selected attributes 2. By collecting these data select the best attribute which helps for the prediction of the student result, their behaviour and academic achievement. 3. Select the best data mining algorithm for your dataset which give the result with great accuracy. We are applying different classification data mining algorithm for our analysis. 4. At the end, validate the presented model for different student of engineering institution and university of India. III. DIFFERENT SOFTWRAES AVAILABLE FOR THE DATA MINING ANALYSIS At present scenario, data is one of the most important in today’s world. Because by analysing that data we can find some information which will be helpful in future. We have different types of data mining software for analysis. Every organisation deals with different types of data in real life like data related to education, business, sales, marketing, hospital, hospitality etc. Software’s has their own features and properties and it depend on the data that which software is suitable for their analysis [6]. Here we present ten most important tools used for the data analysis in tabular form below: Table 1: List of different software available for the purpose of data mining analysis S.No Software Language used Developed State 1 RapidMiner Java Technical University of Dortmund 2 SAS Data Mining C North Carolina State University 3 WEKA Java University of Waikato, New Zealand 4 R-Software C, Fortran, R University of Auckland, New Zealand 5 Orange Python University of Ljubljana 6 KNIME Java University of Konstanz 7 NLTK Python University of Pennsylvania 8 DataMelt Jython, Groovy jWork.ORG community 9 Pentaho Java Hitachi Data Systems 10 Tanagra DELPHI 6 Lumière University Lyon, France After reading different research paper about educational data mining we find that RapidMiner and WEKA are the mostly used software for the analysis purpose. So form the above discussion we are taken WEKA software tool for our analysis purpose. WEKA is an Open source software and easily available for the user under GNU public licence. We can also implement our own algorithm on this software. Most of the data mining algorithms are available in WEKA software. WEKA is a complete package of different data mining or machine learning algorithm. It support classification, clustering, regression, association rule and feature selection algorithm. It also able to shows you various relationships between data sets, cluster, visualization, predictive modelling and association rule algorithms. IV. CLASSIFICATION ALGORITHM TAKEN INTO CONSIDERATION FOR ANALYSIS We have different types of data mining algorithms are available to make an analysis of our data like clustering, classification, association rule mining. But which data mining algorithm is suitable for your data is depend upon what types of information your want to take and what types of data set you have in your hand. Before selecting any algorithm make sure that what types of information your want to take from the dataset [7]. Every data mining model is created with the help of a specific algorithm. We can solve any data mining problem with best possible way by using more than one algorithm. In this article we want to make a prediction related to the final result of the student in the coming semester. You will be the successful at data mining field even if you are not very much familiar with the inner working of the each algorithm. But it is important to get the full understanding of the general features of the each algorithms and their suitability with different dataset. Data mining function may be off two types supervised and unsupervised. Here according to our dataset fall into the categories of supervised learning. Under supervised learning we want to apply classification function. Because we want to predict the result of the student according to the predefined classes [4]. There are lots of algorithms fall into the categories of classification like Decision tree, Naive Bayes, Generalized Linear Models (GLM), Support Vector Machine (SVM) etc. In this article we want to apply Decision tree algorithms because it extracts predictive information in human readable and easy to understandable form. The rules generated are in the form of if-else expressions and hence they leads to the prediction. There are lot of Classification algorithm are available for the analysis but we are applying only few of them for the IJTC.O RG

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC) ISSN-2455-099X, Volume 2, Issue 10 October 2016 IJTC201610001 www. ijtc.org 491 analysis purpose like, J48, RandomForest REPTree, LADTree and then compare their predictive result. V. DATA COLLECTION AND PROCESSING PHASE For predicting the academic progress of any student in their early stage of higher education is very important. Because early prediction of result always help a students to perform well in the final examination. So for making any prediction related to academic progress of the students we need lots of parameters of students [1]. Prediction model takes lots of parameter of student into consideration like personal, family, psychological and social information for effective prediction in their academics. Student’s educational backgrounds are also taken into consideration while making the prediction [8]. In student’s educational background make contain data like grade, attendance, behaviour, attitude toward study etc. The dataset used for this article was taken from a reputed engineering from under Punjab Technical University Jalandhar. This university produced lots of engineer every year. But the problem is that not all the registered student gets their degree in time due to their backlogs. So in this study we want to analysis the result of student in advance and if the result is not in the favour of student then we can provide timely help to them to improve their result in final examination. For that analysis we need to collect some information from the student and then apply data mining algorithm on that dataset and hence predict the final result of the student in advance. Students have lots of attribute in their study period but we need to collect only those attributes only which are helpful for the prediction of the result. We are selecting only eleven attributes which we think are one of the most important in all the attributes. We was selecting student grade in high school and senior secondary school, gender, family size, family status, parents qualification, parents occupations and previous semester result [1]. Most the information which we collected is from the previous record of the students which are most probably available with the concerned institution. Most of the information was collect from the database of the institution. All the selected attributes with their response variables are listed in the table given below: Table 2: Selected attributes of students considered for the analysis purpose Attributes Description of the attributes Possible Values of the attribute Branch Students Branch {CS, ECE, ME, CE} Gender Student Gender {Male, Female} Grade_HS High School Grade {E – Above 90%, A – 81- 90%, B – 71-80%, C – 61- 70%, D – 51-60%, E – 40- 50%, F - < 40%} Grade_SS Senior { E – Above 90%, A – 81- Secondary Grade 90%, B – 71-80%, C – 61- 70%, D – 51-60%, E – 40- 50%, F - < 40%} Family_Size student’s family size {1, 2, 3, >3} Family_Status Students family status {Joint, Individual} Father_Qual Fathers qualification {no-education, elementary, secondary, UG, PG, Ph.D. NA} Mother_Qual Mother’s Qualification {no-education, elementary, secondary, UG, PG, Ph.D. NA} Father_Occ Father’s Occupation {Service, Business, Agriculture, Retired, NA} Mother_Occ Mother’s Occupation {House-wife (HW), Service, Retired, NA} Result Result in B. Tech Ist Year {Pass, Promoted, Fail} All the attributes selected above are taken into consideration for the purpose of prediction with data mining techniques. At the starting phase we start with twenty attributes but find some attribute irrelevant to predict the result. Due to this reason we just ignore those attributes for the final selection of the dataset for the analysis. VI. IMPLEMENTATION OF DATA MINING MODEL FOR PREDICTION As we already discuss that we will use WEKA tools for our implementation. Because it is open source and maximum classification algorithm are implemented on it. After collecting all the information above put it in SUDENTDATA.csv files. Before loading this file into the WEKA explorer make sure that all the information is correct according to the format of data collection. After loading STUDENTDATA.csv file into explorer, apply different classification algorithms on that data. There is more the sixteen Decision tree algorithm are available for the analysis [2]. In WEKA we are applying J48, RandomForest, REPTree and LADTree for over analysis here. After selecting these algorithms, next step is to select 10-fold cross-validation under “Test options” conditions. There is no separate data set for the testing of the algorithm, so it is necessary to get reasonable idea of accuracy for the generated algorithm. The predictive result provide use information that student will perform or not in the examination. VII. RESULTS AND DISCUSSION We are working on four decision trees for the prediction of final result from the student dataset by four machine learning algorithms: the J48, RandomForest, REPTree and LADTree respectively. These all are the important algorithm for the prediction purpose. IJTC.O RG

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC) ISSN-2455-099X, Volume 2, Issue 10 October 2016 IJTC201610001 www. ijtc.org 492 Fig 1: Tree generated by LADTree decision tree algorithm by WEKA tool. Fig 2: Tree generated by REPTree decision tree algorithm by WEKA tool. The table III shows the accuracy of J48, RandomForest, REPTree and LADTree algorithms for classification applied on the given educational data sets with 10-fold cross validation under test options in Weka tool is given below: Table III: Classifiers accuracy with Weka tool Algorithm Correctly Classified Instance Incorrectly Classified Instance J48 62.6068% 37.3932% RandomForest 51.4957% 48.5043% REPTree 58.3333% 41.6667% LADTree 57.906% 42.094% Table III shows that a J48 technique used for classification has highest accuracy of 62.6068% compared to other decision tree techniques. Other algorithms are also having great level of accuracy. After J48 algorithm RandomForest algorithm also showed accuracy up to 58.3333%. Table IV also shows the four decision tree algorithms for classification that produce predictive models. We also put all the information of the classification accuracy with their class. IJTC.O RG

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC) ISSN-2455-099X, Volume 2, Issue 10 October 2016 IJTC201610001 www. ijtc.org 493 Table IV: Detailed J48, RandomForest, REPTree and LADTree algorithms accuracy by class Algorithm Class TP Rate FP Rate J48 Pass 1.000 1.000 Promoted 0.000 0.000 RandomForest Pass 0.666 0.737 Promoted 0.263 0.334 REPTree Pass 0.901 0.949 Promoted 0.051 0.099 LADTree Pass 0.894 0.949 Promoted 0.051 0.106 In table V we put the time complexity of various classification algorithm techniques like J48, RandomForest and REPTree and LADTree algorithms in seconds. Table V: Execution time to build the J48, RandomForest and REPTree and LADTree model Algorithm Execution Time(Sec) J48 0.00 Seconds RandomForest 0.09 Seconds REPTree 0.00 Seconds LADTree 0.04 Seconds VIII. CONCLUSION Classification is one of the most interesting and important topic of data mining techniques. Most of the researchers in this field are using classification algorithm of data mining for knowledge discovery from the dataset. There are lots of classifications techniques are there in data mining like Decision tree, Bayes, etc. We are here using decision tree algorithm for prediction of the result. We are using one of the best classification algorithms for the prediction of the student result of the engineering student of first year students. Form the above analysis we can find the TP ration of the J48 and REPTree is 1.00 and 0.901 respectively. It means that these to algorithm are almost identifies those student who have possibility to pass the final examination. The rest of the student who are not able to pass the examination in our prediction may need some counselling to improve their result. In future study we can add more algorithms on the dataset and hence get some more accuracy in the result. I think this is one of the best ways to improve the performance of the student in their final examination. REFERENCES 1. B.K. Bharadwaj and S. Pal. “Data Mining: A prediction for performance improvement using classification”, International Journal of Computer Science and Information Security (IJCSIS), Vol. 9, No. 4, pp. 136- 140, 2011. 2. SK Yadav and S. Pal et al. “Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification “World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221- 0741 Vol. 2, No. 2, 51-56, 2012. 3. Galit.et.al, “Examining online learning processes based on log files analysis: a case study”. Research, Reflection and Innovations in Integrating ICT in Education 2007. 4. Z. J. Kovacic, “Early prediction of student success: Mining student enrollment data”, Proceedings of Informing Science & IT Education Conference 2010. 5. Dr. S. B. Jagtap and Dr. Kodge B. G. “Census Data Mining and Data Analysis using WEKA” (ICETSTM – 2013) International Conference in “Emerging Trends in Science, Technology and Management-2013, Singapore. 6. Z. N. Khan, “Scholastic achievement of higher secondary students in science stream”, Journal of Social Sciences, Vol. 1, No. 2, pp. 84-87, 2005. 7. Bhise R.B, Thorat S.S., Supekar A.K. “Importance of Data Mining in Higher Education System” IOSR Journal Of Humanities And Social Science (IOSR- JHSS) ISSN: 2279-0837, ISBN: 2279-0845. Volume 6, Issue 6 (Jan. - Feb. 2013), PP 18-21. 8. Komal S. Sahedani, Prof. B Supriya Reddy " A Review: Mining Educational Data to Forecast Failure of Engineering Students" International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 12, December 2013 ISSN: 2277 128X 9. U. K. Pandey, and S. Pal, “Data Mining: A prediction of performer or underperformer using classification”, (IJCSIT) International Journal of Computer Science and Information Technology, Vol. 2(2), pp.686-690, ISSN: 0975-9646, 2011. 10. S. T. Hijazi, and R. S. M. M. Naqvi, “Factors affecting student’s performance: A Case of Private Colleges”, Bangladesh e-Journal of Sociology, Vol. 3, No. 1, 2006. 11. Q. A. AI-Radaideh, E. W. AI-Shawakfa, and M. I. AI- Najjar, “Mining student data using decision trees”, International Arab Conference on Information Technology (ACIT'2006), Yarmouk University, Jordan, 2006. 12. Connolly T., C. Begg et al, (1999) Database System: A practical approach to design, Implementation and management (3 rd edition), Harlow; Addison-Wesley, 687. IJTC.O RG

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC) ISSN-2455-099X, Volume 2, Issue 10 October 2016 IJTC201610001 www. ijtc.org 494 13. Erdogan and Timor (2005) A data mining application in a student database. Journal of Aeronautic and Space Technologies July 2005 Volume 2 Number 2 (53-57) 14. Han,J. and Kamber, M., (2006) "Data Mining: Concepts and Techniques", 2nd edition. The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series Editor. 15. Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, 2nd ed., Morgan Kaufmann publishers, San Francisco, 2006. 16. George M. Marakas, Modern Data Warehousing, Mining, and Visualization, Pearson Education, New Delhi, 2005. 17. Michael J.A. Berry and Gordon S. Linoff, Data Mining Techniques, 2nd ed., Wiley Publishing Inc., USA, 2004. 18. Margaret H. Dunham, Data Mining Introductory and Advanced Topics, Pearson Education, New Delhi, 2009 19. Sinha, A. P., & Zhao, H. (2008). Incorporating domain knowledge into data mining classifiers: An application in indirect lending. Decision Support System, 46(1), 287-299. 20. Wang, H., & Wang, S. (2008). A knowledge management approach to data mining process for business intelligence. Industrial Management & Data Systems, 108(5). 21. Yuan, J. L., & Fine, T. (1998). Neural-network design for small training sets of high dimension. IEEE Transactions on Neural Networks, 9. 22. Andonie, R. (2010). Extreme Data Mining: Inference from Small Datasets. Int. J. Of Computers, Communications & Control, 5(3). 23. Becerra-Fernandez, I., & Gonzales, A., & Sabherwal, R. (2004). Knowledge Management, Challenges, Solutions, and Technologies. Pearson Prentice Hall. 24. Berry, M., & Linoff, G. (2000). Mastering Data Mining. The Art and Science of Customer Relationship Management. Wiley. 25. Jiawei Han and Micheline Kamber, “Data Mining Concepts and Techniques”, 2nd Edition, 2000. 26. J. R. Quinlan, “Introduction of decision tree”, Journal of Machine learning”, pp. 81-106, 1986. 27. Yoav Freund and Llew Mason, “The Alternating Decision Tree Algorithm”. Proceedings of the 16th International Conference on Machine Learning, pp. 124-133, 1999. 28. Saurabh Pal.” Mining Educational Data to Reduce Dropout Rates of Engineering Students”, IJIEEB, April-2012, Vol-2, pp.1-7. 29. M. Ramaswami and R. Bhaskaran , ” A CHAID Based Performance Prediction Model in Educational Data Mining” , IJCSI , Vol. 7 , Issue 1 , No. 1 , January 2010 , pp.10-18 30. M. Ramaswami and R. Bhaskaran , ” A CHAID Based Performance Prediction Model in Educational Data Mining” , IJCSI , Vol. 7 , Issue 1 , No. 1 , January 2010 , pp.10-18 IJTC.O RG

Predicting students performance using classification techniques in data mining

More Related Content

What's hot

Similar to Predicting students performance using classification techniques in data mining

More from Lovely Professional University

Recently uploaded

In this document

Predicting students performance using classification techniques in data mining