Comparative Analysis of KNN and Naive Bayes for Adolescent Mental Health Detection

Authors

  • Mahesa Darma Satria Institut Informatika dan Bisnis Darmajaya
  • Yogi Saputra Institut Informatika dan Bisnis Darmajaya

Keywords:

machine learning, mental health detection, k-nearest neighbor, naive bayes, imbalanced dataset

Abstract

This study is motivated by the rapid rise of social media use among adolescents, which may adversely affect mental health by increasing the risk of stress, anxiety, and depression; thus, early detection is essential to prevent more severe outcomes. It aims to compare the performance of K-Nearest Neighbor (KNN) and Naive Bayes in detecting adolescent mental health risks and to evaluate the impact of data balancing using the Synthetic Minority Over-sampling Technique (SMOTE). A quantitative experimental design was applied, including data preprocessing, model implementation, and evaluation using 10-fold cross-validation with accuracy, precision, recall, F1-score, and AUC as performance metrics. The results show that Naive Bayes provides more stable performance with higher accuracy and precision, while KNN combined with SMOTE significantly improves recall, particularly for minority classes, indicating a trade-off between precision and recall in model selection. This study contributes a comprehensive analysis of the role of data balancing in classification performance within mental health contexts. Future work should explore ensemble and deep learning approaches and utilize larger, more diverse datasets to enhance generalizability.

Downloads

Download data is not yet available.

References

H. Lahti, M. Kokkonen, L. Hietajärvi, N. Lyyra, and L. Paakkari, “Social media threats and health among adolescents: evidence from the health behaviour in school-aged children study,” Child and Adolescent Psychiatry and Mental Health 2024 18:1, vol. 18, no. 1, pp. 62-, May 2024, doi: 10.1186/S13034-024-00754-8.

F. Mougharbel et al., “Heavy social media use and psychological distress among adolescents: the moderating role of sex, age, and parental support,” Front. Public Health, vol. 11, p. 1190390, Jun. 2023, doi: 10.3389/FPUBH.2023.1190390/FULL.

J. L. Kalman et al., “Digitalising mental health care: Practical recommendations from the European Psychiatric Association,” European Psychiatry, vol. 67, no. 1, p. e4, 2024, doi: 10.1192/J.EURPSY.2023.2466.

K. F. Yuen, X. Wang, T. Kyriazos, and M. Poga, “Application of Machine Learning Models in Social Sciences: Managing Nonlinear Relationships,” Encyclopedia 2024, Vol. 4, Pages 1790-1805, vol. 4, no. 4, pp. 1790–1805, Nov. 2024, doi: 10.3390/ENCYCLOPEDIA4040118.

C. El Morr et al., “Predictive Machine Learning Models for Assessing Lebanese University Students’ Depression, Anxiety, and Stress During COVID-19,” J. Prim. Care Community Health, vol. 15, Jan. 2024, doi: 10.1177/21501319241235588.

N. Ukey, Z. Yang, B. Li, G. Zhang, Y. Hu, and W. Zhang, “Survey on Exact kNN Queries over High-Dimensional Data Space,” Sensors 2023, Vol. 23, Page 629, vol. 23, no. 2, p. 629, Jan. 2023, doi: 10.3390/S23020629.

S. Uddin and H. Lu, “Dataset meta-level and statistical features affect machine learning performance,” Scientific Reports 2024 14:1, vol. 14, no. 1, pp. 1670-, Jan. 2024, doi: 10.1038/s41598-024-51825-x.

M. Xiao, “Electroencephalogram Emotion Recognition via AUC Maximization,” Aug. 2024, Accessed: Apr. 30, 2026. [Online]. Available: https://arxiv.org/pdf/2408.08979

S. F. Taskiran, B. Turkoglu, E. Kaya, and T. Asuroglu, “A comprehensive evaluation of oversampling techniques for enhancing text classification performance,” Scientific Reports 2025 15:1, vol. 15, no. 1, pp. 21631-, Jul. 2025, doi: 10.1038/s41598-025-05791-7.

A. G. Putrada, I. D. Wijaya, and D. Oktaria, “Overcoming Data Imbalance Problems in Sexual Harassment Classification with SMOTE,” International Journal on Information and Communication Technology (IJoICT), vol. 8, no. 1, pp. 20–29, Aug. 2022, doi: 10.21108/IJOICT.V8I1.622.

A. Islam, S. B. Belhaouari, A. U. Rehman, and H. Bensmail, “KNNOR: An oversampling technique for imbalanced datasets,” Appl. Soft Comput., vol. 115, p. 108288, Jan. 2022, doi: 10.1016/J.ASOC.2021.108288.

İ. Baydili, B. Tasci, and G. Tasci, “Deep Learning-Based Detection of Depression and Suicidal Tendencies in Social Media Data with Feature Selection,” Behavioral Sciences 2025, Vol. 15, Page 352, vol. 15, no. 3, p. 352, Mar. 2025, doi: 10.3390/BS15030352.

D. Elreedy et al., “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Machine Learning 2023 113:7, vol. 113, no. 7, pp. 4903–4923, Jan. 2023, doi: 10.1007/S10994-022-06296-4.

M. Pagan, M. Zarlis, and A. Candra, “Investigating the impact of data scaling on the k-nearest neighbor algorithm,” Computer Science and Information Technologies, vol. 4, no. 2, pp. 135–142, Jul. 2023, doi: 10.11591/CSIT.V4I2.P135-142.

V. Wijaya and N. Rachmat, “Comparison of SVM, Random Forest, and Logistic Regression Performance n Student Mental Health Screening,” JEECS (Journal of Electrical Engineering and Computer Sciences), vol. 9, no. 2, pp. 173–184, Dec. 2024, doi: 10.54732/JEECS.V9I2.9.

S. S. Prasetiyowati and Y. Sibaroni, “Unlocking the potential of Naive Bayes for spatio temporal classification: a novel approach to feature expansion,” Journal of Big Data 2024 11:1, vol. 11, no. 1, pp. 106-, Aug. 2024, doi: 10.1186/S40537-024-00958-X.

M. Abdelhamid and A. Desai, “Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification,” Sep. 2024, Accessed: Apr. 30, 2026. [Online]. Available: https://arxiv.org/pdf/2409.19751

E. Gentili et al., “Machine learning from real data: A mental health registry case study,” Computer Methods and Programs in Biomedicine Update, vol. 5, p. 100132, Jan. 2024, doi: 10.1016/J.CMPBUP.2023.100132.

I. H. Sarker, A. S. M. Kayes, and P. Watters, “Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage,” Journal of Big Data 2019 6:1, vol. 6, no. 1, pp. 57-, Jul. 2019, doi: 10.1186/S40537-019-0219-Y.

D. J. Maulana, S. Saadah, and P. E. Yunanto, “Kmeans-SMOTE Integration for Handling Imbalance Data in Classifying Financial Distress Companies using SVM and Naïve Bayes,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 8, no. 1, pp. 54–61, Feb. 2024, doi: 10.29207/RESTI.V8I1.5140.

C. Liu, L. Ji, J. Lu, J. Ma, and X. Sui, “College student mental health assessment: Predictive models based on machine learning and feature importance analysis,” Molecular & Cellular Biomechanics, vol. 22, no. 3, pp. 1477–1477, Feb. 2025, doi: 10.62617/MCB1477.

M. Mujahid et al., “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering,” Journal of Big Data 2024 11:1, vol. 11, no. 1, pp. 87-, Jun. 2024, doi: 10.1186/S40537-024-00943-4.

E. F. Agyemang et al., “Addressing Class Imbalance Problem in Health Data Classification: Practical Application From an Oversampling Viewpoint,” Applied Computational Intelligence and Soft Computing, vol. 2025, no. 1, p. 1013769, Jan. 2025, doi: 10.1155/ACIS/1013769.

S. Riyanto, I. S. Sitanggang, T. Djatna, and T. D. Atikah, “Comparative Analysis using Various Performance Metrics in Imbalanced Data for Multi-class Text Classification,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 6, pp. 1082–1090, Jun. 2023, doi: 10.14569/IJACSA.2023.01406116.

E. Ahmed, “Student Performance Prediction Using Machine Learning Algorithms,” Applied Computational Intelligence and Soft Computing, vol. 2024, no. 1, p. 4067721, Jan. 2024, doi: 10.1155/2024/4067721.

Z. Sabouri, N. Gherabi, M. Nasri, M. Amnai, H. El Massari, and I. Moustati, “Prediction of Depression via Supervised Learning Models: Performance Comparison and Analysis,” International Journal of Online and Biomedical Engineering (iJOE), vol. 19, no. 09, pp. 93–107, Jul. 2023, doi: 10.3991/IJOE.V19I09.39823.

N. Mumenin et al., “Screening depression among university students utilizing GHQ-12 and machine learning,” Heliyon, vol. 10, no. 17, Sep. 2024, doi: 10.1016/j.heliyon.2024.e37182.

D. Widyawati, A. Faradibah, P. Lestari, and L. Belluano, “Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine,” Indonesian Journal of Data and Science, vol. 4, no. 2, pp. 78–86, Jul. 2023, doi: 10.56705/IJODAS.V4I2.76.

T. R. Shaha et al., “Feature group partitioning: an approach for depression severity prediction with class balancing using machine learning algorithms,” BMC Medical Research Methodology 2024 24:1, vol. 24, no. 1, pp. 123-, Jun. 2024, doi: 10.1186/S12874-024-02249-8.

Published

2026-06-30

How to Cite

Darma Satria, M., & Saputra, Y. (2026). Comparative Analysis of KNN and Naive Bayes for Adolescent Mental Health Detection. Journal of Electrical Engineering and Informatics (JEEI), 1(2). Retrieved from https://jurnal.sttnlampung.ac.id/index.php/jeei/article/view/189

Issue

Section

Articles