A Comparison of Strategies for Missing Values in Data on Machine Learning Classification Algorithms

dc.contributor.authorMakaba, T.
dc.contributor.authorDogo, E.
dc.date.accessioned2025-04-25T19:10:05Z
dc.date.issued2019
dc.description.abstractDealing with missing values in data is an important feature engineering task in data science to prevent negative impacts on machine learning classification models in terms of accurate prediction. However, it is often unclear what the underlying cause of the missing values in real-life data is or rather the missing data mechanism that is causing the missingness. Thus, it becomes necessary to evaluate several missing data approaches for a given dataset. In this paper, we perform a comparative study of several approaches for handling missing values in data, namely listwise deletion, mean, mode, k-nearest neighbors, expectation-maximization, and multiple imputations by chained equations. The comparison is performed on two real-world datasets, using the following evaluation metrics: Accuracy, root mean squared error, receiver operating characteristics, and the F1 score. Most classifiers performed well across the missing data strategies. However, based on the result obtained, the support vector classifier method overall performed marginally better for the numerical data and naïve Bayes classifier for the categorical data when compared to the other evaluated missing value methods.
dc.identifier.citationT. Makaba and E. Dogo, "A Comparison of Strategies for Missing Values in Data on Machine Learning Classification Algorithms," 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Vanderbijlpark, South Africa, 2019, pp. 1-7, doi: 10.1109/IMITEC45504.2019.9015889.
dc.identifier.otherdoi: 10.1109/IMITEC45504.2019.9015889
dc.identifier.urihttp://repository.futminna.edu.ng:4000/handle/123456789/1083
dc.language.isoen
dc.publisherIEEE
dc.subjectMeasurement
dc.subjectMice
dc.subjectClassification algorithms
dc.subjectSupport vector machines
dc.subjectData models
dc.subjectRadio frequency
dc.subjectMachine learning
dc.subjectmissing data
dc.subjectimputation methods
dc.subjectperformance metrics
dc.subjectmachine learning
dc.subjectclassification
dc.titleA Comparison of Strategies for Missing Values in Data on Machine Learning Classification Algorithms
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms_CNN.pdf
Size:
58.48 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: