An Improved Adaptive Synthetic Sampling Technique and Machine Learning Model for Enhanced Imbalance Medical Data Classification

Abdullahi, Hafiz; Bashir, Sulaimon Adebayo; Aminu, Enesi Femi

Please use this identifier to cite or link to this item: http://repository.futminna.edu.ng:8080/jspui/handle/123456789/27575

Title:	An Improved Adaptive Synthetic Sampling Technique and Machine Learning Model for Enhanced Imbalance Medical Data Classification
Authors:	Abdullahi, Hafiz Bashir, Sulaimon Adebayo Aminu, Enesi Femi
Keywords:	Imbalance, Datasets, Adaptive, synthetic, Data mining, Machine learning, Oversampling, Undersampling.
Issue Date:	Nov-2023
Publisher:	Federal UNiversity of Technology Akure
Citation:	Abdullahi, Hafiz, S.A., & Aminu, E.F. (2023). An Improved Adaptive Synthetic Sampling Technique and Machine Learning Model for Enhanced Imbalance Medical Data Classification. Proceedings of the 2023 School of Engineering and Engineering Technology (SEET) Annual Conference FUTA Nigeria.
Abstract:	Medical data classification plays a pivotal role in healthcare decision-making. Addressing the challenges posed by imbalanced datasets is critical for accurate classification in this domain. This paper presents an innovative approach to enhancing the Adaptive Synthetic Sampling (ADASYN) algorithm, tailored specifically for medical data classification. The proposed Improved ADASYN algorithm integrates ADASYN with k-means clustering to address two key issues: generating synthetic minority samples and eliminating potential outliers introduced by ADASYN. By doing so, it aims to mitigate the adverse effects of reduced accuracy in the majority class, ultimately enhancing classification performance. The pre-processed medical data undergoes an estimation process to determine the requisite number of synthetic samples, which are subsequently generated using ADASYN. These synthesized samples are seamlessly merged with the original minority data. Subsequently, k-means clustering is employed to identify and filter out misclustered data, effectively removing outliers. If data imbalance persists, the algorithm iterates, recalculating the need for additional minority samples. This iterative process continues until a balanced dataset is achieved. The resulting balanced dataset is then primed for utilization by machine learning algorithms for classification purposes. Notably, the proposed algorithm was implemented using MATLAB version R2023a, ensuring reproducibility and applicability in practical medical data classification scenarios. This research presents a promising step towards improving the robustness and accuracy of medical data classification, thereby contributing to enhanced healthcare decision support systems.
URI:	http://repository.futminna.edu.ng:8080/jspui/handle/123456789/27575
ISBN:	:978-978-785-579-9
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Afiz_FUTA_Conf.pdf		2.28 MB	Adobe PDF	View/Open

Show full item record