Comparative Analysis of Classification Algorithms for Student Performance Prediction

Authors

  • Tushar Singh Maharshi Dayanand University

DOI:

https://doi.org/10.5281/ijurd.v2i2.13

Keywords:

Educational Data Mining, Classification Algorithms, K-Nearest Neighbors (KNN), Naive Bayes

Abstract

Student performance prediction is a significant application of data mining in the education sector, aimed at identifying students at risk of poor academic outcomes. Early prediction enables institutions to implement timely interventions and personalized support strategies. This study presents a comparative analysis of four widely used classification algorithms — Decision Tree, Random Forest, K-Nearest Neighbors (KNN), and Naive Bayes — for predicting student academic performance using historical academic and demographic data. The dataset includes attributes such as previous grades, attendance, study time, and socio-economic factors. Standard preprocessing techniques, including data cleaning, normalization, and encoding, were applied before model training. The performance of each classifier was evaluated using Accuracy, Precision, Recall, F1-Score, and Confusion Matrix, along with k-fold cross-validation to ensure reliability. Experimental results indicate that ensemble methods such as Random Forest achieve higher predictive accuracy, while simpler models provide computational efficiency. The findings highlight the importance of algorithm selection in educational data mining and demonstrate the potential of machine learning models to support data-driven academic decision-making. 

References

Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In Proceedings of the 5th Annual Future Business Technology Conference (FUBUTEC 2008) (pp. 5–12).

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964

Delen, D. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, 49(4), 498–506. https://doi.org/10.1016/j.dss.2010.06.003

Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.

McCallum, A., & Nigam, K. (1998). A comparison of event models for Naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization (pp. 41–48).

Osmanbegović, E., & Suljić, M. (2012). Data mining approach for predicting student performance. Economic Review – Journal of Economics and Business, 10(1), 3–12.

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251

Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532

Aman & Chhillar, R. S. (2021). Analyzing predictive algorithms in data mining for cardiovascular disease using WEKA tool. International Journal of Advanced Computer Science and Applications, 12(8), 144–150.

Aman & Chhillar, R. S. (2022). Analyzing three predictive algorithms for diabetes mellitus against the Pima Indians dataset. ECS Transactions, 107(1), 2697.

Aman & Chhillar, R. S. (2023). Optimized stacking ensemble for early-stage diabetes mellitus prediction. International Journal of Electrical and Computer Engineering, 13(6).

Aman & Chhillar, R. S. (2024). A stacking-based hybrid model with random forest as meta-learner for diabetes mellitus prediction. International Journal of Machine Learning, 14(2), 54–58.

Aman, Chhillar, R. S., & Chhillar, U. (2023). Disease prediction in healthcare: An ensemble learning perspective.

Aman, Chhillar, R. S., & Chhillar, U. (2024). Machine learning in the battle against COVID-19: Predictive models and future directions. Future Computing Technologies for Sustainable Development (NCFCTSD-24).

Aman, Chhillar, R. S., & Chhillar, U. (2025). Machine learning and chronic kidney disease: Towards early prediction and diagnosis. Emerging Trends in Engineering, Commerce, Management and Hospitality Management in the Digital Age for a Sustainable Future.

Darolia, A., Chhillar, R. S., Alhussein, M., Dalal, S., Aurangzeb, K., & Lilhore, U. K. (2024). Enhanced cardiovascular disease prediction through self-improved Aquila optimized feature selection in quantum neural network and LSTM model. Frontiers in Medicine, 11, 1414637.

Published

2026-02-28

How to Cite

Singh, T. (2026). Comparative Analysis of Classification Algorithms for Student Performance Prediction. International Journal of Unified Research & Development (IJURD), 2(2). https://doi.org/10.5281/ijurd.v2i2.13