IMPLEMENTASI KOMBINASI METODE RESAMPLING PADA KLASIFIKASI PENYAKIT STROKE DENGAN ALGORITMA K-NEAREST NEIGHBOR DAN SELEKSI FITUR INFORMATION GAIN

Muhammad Fathurrahman, . (2023) IMPLEMENTASI KOMBINASI METODE RESAMPLING PADA KLASIFIKASI PENYAKIT STROKE DENGAN ALGORITMA K-NEAREST NEIGHBOR DAN SELEKSI FITUR INFORMATION GAIN. Skripsi thesis, Universitas Pembangunan Nasional Veteran Jakarta.

[img] Text
ABSTRAK.pdf

Download (722kB)
[img] Text
AWAL.pdf

Download (1MB)
[img] Text
BAB 1.pdf

Download (641kB)
[img] Text
BAB 2.pdf
Restricted to Repository UPNVJ Only

Download (1MB)
[img] Text
BAB 3.pdf
Restricted to Repository UPNVJ Only

Download (877kB)
[img] Text
BAB 4.pdf
Restricted to Repository UPNVJ Only

Download (1MB)
[img] Text
BAB 5.pdf

Download (622kB)
[img] Text
DAFTAR PUSTAKA.pdf

Download (654kB)
[img] Text
RIWAYAT HIDUP.pdf
Restricted to Repository UPNVJ Only

Download (519kB)
[img] Text
LAMPIRAN.pdf
Restricted to Repository UPNVJ Only

Download (2MB)
[img] Text
HASIL PLAGIARISME.pdf
Restricted to Repository staff only

Download (10MB)
[img] Text
ARTIKEL KI.pdf
Restricted to Repository staff only

Download (1MB)

Abstract

One of the main problems in the medical world is stroke. Stroke is the second cause of death in the world. Based on the results of Basic Health Research (Riskesdar) in 2018, the prevalence of stroke in Indonesia is 713,783 people who suffer from stroke every year. However, diagnosing a stroke takes quite a long time. Considering that every minute there are cells that die due to blockage of flow in the brain. Data mining can be used as a prediction of disease. In making data mining models, data imbalance is a problem because it can have a negative impact on the classification results where the machine learning model will pay more attention to the majority class and ignore the minority class. In this study, stroke prediction was carried out using the K-Nearest Neighbor algorithm by combining resampling techniques such as SMOTE, Tomek Links, and ENN. As well as research conducted to determine the effect of the search feature information obtained on the model. Through a 10 fold cross validation process, it is known that the K-NN machine learning model with SMOTE and Tomek Links is able to predict stroke with an accuracy of 83.5%, an f1-score of 12.5%, and a recall of 24.7%. Then K-NN with SMOTE and ENN obtained 78% accuracy, f1 score 16.8%, and recall 45%. When the selection of information gain features is carried out, there is an increase in performance in both methods. SMOTE and Tomek Links produce 79.9% accuracy, 18,3% f1-score, and 46,6% recall and the combination of SMOTE and ENN obtains 76% accuracy, 20% f1-score, and 59% recall. After the experiments, it is known that the resampling technique can improve the performance of the model in the case of imbalanced data from the recall and f1-score values by 54% and 7%.

Item Type: Thesis (Skripsi)
Additional Information: [No.Panggil: 1910511058] [Pembimbing: Nur Hafifah Matondang] [Penguji 1: Bayu Hananto] [Penguji 2: Theresia Wati]
Uncontrolled Keywords: Synthetic Minority Over-sampling, K-Nearest-Neighbor, Stroke, Tomek Links, Edited Nearest Neighbor, Information gain
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > T Technology (General)
Divisions: Fakultas Ilmu Komputer > Program Studi Informatika (S1)
Depositing User: Muhammad Fathurrahman
Date Deposited: 26 Jul 2023 03:17
Last Modified: 26 Jul 2023 03:17
URI: http://repository.upnvj.ac.id/id/eprint/25249

Actions (login required)

View Item View Item