Diabetes Mellitus Disease Prediction Using Logistic Regression (LR) and Support Vector Machine (SVM) Methods

Authors

  • Akbar Febrian Dwi Hastono Informatic Engineering Study Program Dr. Soetomo University
  • Anik Vega Vitianingsih Informatic Engineering Study Program Dr. Soetomo University
  • Pamudi Pamudi Informatic Engineering Study Program Dr. Soetomo University
  • Anastasia Lidya Maukar Department of Industrial Engineering President University
  • Seftin Fitri Ana Wati Information System Department Universitas Pembangunan Nasional Veteran Jawa Timur

DOI:

https://doi.org/10.51454/decode.v5i1.1039

Keywords:

Confusion Matrix, Diabetes Mellitus, Logistic Regression, Prediction, Support Vector Machine

Abstract

Diabetes Mellitus (DM), also known as diabetes or sugar disease, marked by high blood sugar levels and poses a major health issue in Indonesia with the number of cases increasing every year. Often referred to as the silent killer, DM often goes unnoticed due to its subtle symptoms, increasing the risk of severe complications if not treated promptly. The lack of information or awareness about the early symptoms of DM, limited time and cost in conducting health checks, and limited access to health services are challenges in detecting DM disease early. To overcome this problem, the development of a prediction model is essential to prevent serious complications. This study aims to create a predictive model using LR and SVM methods based on parameters such as pregnancy, glucose levels, blood pressure, skin thickness, insulin, BMI, diabetes pedigree, age, and outcome. The dataset used is DM disease risk data collected by Kaggle from the National Institute of Diabetes and Disgetive and Kidney Disease (NIDDK). Based on the research results, the LR method shows a better level of accuracy compared to the SVM method. The accuracy of the model using the Logistic Regression method is 79.31% while the SVM method has an accuracy value of 77.24%, with a difference in accuracy of 2.07%. This research applies hyperparameter tuning with Grid Search to find the best combination of hyperparameter.

References

Amelia, U., Indra, J., & Masruriyah, A. F. N. (2022). Implementasi Algoritma Support Vector Machine (SVM) Untuk Prediksi Penyakit Stroke Dengan Atribut Berpengaruh. Scientific Student Journal for Information, Technology and Science, III(2), 254–259.

Aris, F. (2019). Penerapan Data Mining untuk Identifikasi Penyakit Diabetes Melitus dengan Menggunakan Metode Klasifikasi. 1(1), 1–6.

Cahyani, Q. R., Finandi, M. J., Rianti, J., Arianti, D. L., & Pratama, A. D. (2022). Prediksi Risiko Penyakit Diabetes menggunakan Algoritma Regresi Logistik Diabetes Risk Prediction using Logistic Regression Algorithm. 1(2), 107–114. https://doi.org/10.55123/jomlai.v1i2.598

Christodoulou, E., Ma, J., Collins, G. S., Steyerberg, E. W., Verbakel, J. Y., & Van Calster, B. (2019). A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology, 110, 12–22. https://doi.org/10.1016/j.jclinepi.2019.02.004

Damayanti, E., Vitianingsih, A. V., Kacung, S., Suhartoyo, H., & Lidya Maukar, A. (2024). Sentiment Analysis of Alfagift Application User Reviews Using Long Short-Term Memory (LSTM) and Support Vector Machine (SVM) Methods. Decode: Jurnal Pendidikan Teknologi Informasi, 4(2), 509–521. https://doi.org/10.51454/decode.v4i2.478

Desiani, A., Akbar, M., Irmeilyana, I., & Amran, A. (2022). Implementasi Algoritma Naïve Bayes dan Support Vector Machine (SVM) Pada Klasifikasi Penyakit Kardiovaskular. Jurnal Teknik Elektro Dan Komputasi (ELKOM), 4(2), 207-214.

Diabetes Dataset. (n.d.). Retrieved March 4, 2025, from https://www.kaggle.com/datasets/mathchi/diabetes-data-set

Firdaus, A. A., Yudhana, A., & Riadi, I. (2023). Analisis Sentimen Pada Proyeksi Pemilihan Presiden 2024 Menggunakan Metode Support Vector Machine. Decode: Jurnal Pendidikan Teknologi Informasi, 3(2), 236-245. https://doi.org/10.51454/decode.v3i2.172

Gunawan, M. I., Sugiarto, D., & Mardianto, I. (2020). Peningkatan Kinerja Akurasi Prediksi Penyakit Diabetes Mellitus Menggunakan Metode Grid Seacrh pada Algoritma Logistic Regression. Jurnal Edukasi Dan Penelitian Informatika (JEPIN), 6(3), 280. https://doi.org/10.26418/jp.v6i3.40718

Hovi, H. S. W., Id Hadiana, A., & Rakhmat Umbara, F. (2022). Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM). Informatics and Digital Expert (INDEX), 4(1), 40–45. https://doi.org/10.36423/index.v4i1.895

Kopitar, L., Kocbek, P., Cilar, L., Sheikh, A., & Stiglic, G. (2020). Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Scientific Reports, 10(1), 1–12. https://doi.org/10.1038/s41598-020-68771-z

Lee, J., Park, H., Kim, M., Yoon, J., Yoo, K., & Byun, S. J. (2024). FastSimiFeat: A Fast and Generalized Approach Utilizing k-NN for Noisy Data Handling. International Conference on Information and Knowledge Management, Proceedings, 1143–1152. https://doi.org/10.1145/3627673.3679591

Mardiana, T., Ditama, E. M., & Tuslaela, T. (2020). an Expert System for Detection of Diabetes Mellitus With Forward Chaining Method. Jurnal Riset Informatika, 2(2), 69–76. https://doi.org/10.34288/jri.v2i2.121

Marwati, F., & Fauzi, R. (2024). Prediksi Penyakit Diabetes Melitus Menggunakan Jaringan Syaraf Tiruan Dengan Metode Backpropagation. Jitu: Jurnal Informatika Utama Hal, 2(1), 26–34.

Maulidah, N., Supriyadi, R., Utami, D. Y., & Hasan, F. N. (2021). Prediksi Penyakit Diabetes Melitus Menggunakan Metode Support Vector Machine dan Naive Bayes. 7(1), 63–68.

Oktaviana, A., Wijaya, D. P., Pramuntadi, A., & Heksaputra, D. (2024). Prediction of Type 2 Diabetes Mellitus Using The K-Nearest Neighbor ( K-NN ) Algorithm Prediksi Penyakit Diabetes Melitus Tipe 2 Menggunakan Algoritma K-Nearest Neighbor ( K-NN ). 4(July), 812–818.

Pratama, A., Nurcahyo, A. C., & Firgia, L. (2023). Penerapan Machine Learning dengan Algoritma Logistik Regresi untuk Memprediksi Diabetes. Prosiding CORISINDO 2023, 116–121.

Saepudin, A., Faqih, A., & Dwilestari, G. (2024). Perbandingan Algoritma Klasifikasi Support Vector Machine, Random Forest dan Logistic Regression Pada Ulasan Shopee. 18(1), 178–192.

Suprihati, F. R. (2021). Analisis Klasifikasi SMS Spam Menggunakan Logistic Regression. Jurnal Sistem Cerdas, 4(3), 155–160. https://doi.org/10.37396/jsc.v4i3.166

Tangkere, B. B. (2024). Analisis Performa Logistic Regression dan Support Vector Classification untuk Klasifikasi Email Phising. Jurnal Ekonomi Manajemen Sistem Informasi (JEMSI), 5(4), 442–450. https://doi.org/10.31933/jemsi.v5i4.1916

Todkar, S. (2016). Diabetes Mellitus the ‘Silent Killer’ of mankind: An overview on the eve of World Health Day! Journal of Medical and Allied Sciences, 6(1), 39. https://doi.org/10.5455/jmas.214333

Downloads

Published

2025-03-14

How to Cite

Akbar Febrian Dwi Hastono, Anik Vega Vitianingsih, Pamudi Pamudi, Anastasia Lidya Maukar, & Seftin Fitri Ana Wati. (2025). Diabetes Mellitus Disease Prediction Using Logistic Regression (LR) and Support Vector Machine (SVM) Methods. Decode: Jurnal Pendidikan Teknologi Informasi, 5(1), 54–64. https://doi.org/10.51454/decode.v5i1.1039

Issue

Section

Articles