Please use this identifier to cite or link to this item: http://repository.futminna.edu.ng:8080/jspui/handle/123456789/6776
Title: A Review of Informative Data Level Resampling Approaches for Solving Class Imbalanced Problem
Authors: Dickson, Dako Apaleokhai
Alhassan, John Kolo
Adepoju, Solomon Adelowo
Keywords: machine learning
imbalance data
preprocessing
data difficulty factors
Issue Date: May-2021
Publisher: Cyber Nigeria/IEEE
Abstract: In the field of machine learning, Imbalanced learning being one among the most challenging classification problems which is also very common among application dataset. Although, imbalanced approach has received increasing attention over the years due to the necessity of handling real world dataset which are usually skewed in nature, possessing various data difficulty factors. The goal of this work is the review of resampling techniques to identify if data intrinsic characteristics were mostly considered during the design of resampling technique. It went further to categorise the techniques into distance, cluster and evolutionary based method, from the result of said process, also presented the advantages and disadvantages of each category and finally, stating general achievements and drawbacks in resampling approaches. The total search that was conducted for this work, yielded 227 papers published within the last two decades, with emphasis on the last. These articles from imbalanced data domains went through different filtering methods, before been finally reduced to 52. It was presented in this work that distanced based methods have received more attention when compared with cluster based and evolutionary based method, this may be due to its merits, which have been presented in this work. From several previous works, data intrinsic characteristics have been found to be more problematic to learning classifier than imbalanced problem. However, from the findings of this work, it was established that despite the report by publications that data intrinsic characteristics are more harmful than imbalanced nature of data, most existing resampling techniques do not regard data intrinsic characteristic in their design, this may be due to the popular nature and attention drawn by imbalanced problem in publications. However, there are some limiting factors that also need to be resolved generally on all the resampling methods such as: lack of consideration of possible relevant examples in undersampling process, lack of outstanding examples interrelationship and similarities evaluation methods. For future work, a robust resampling technique that will critically consider data difficulty factors when evaluating the region and the examples to oversample and undersample. Resampling techniques should also be evaluated against the different types of difficulty factor so as to ascertain the difficulty type it is best used on to achieve great result.
URI: http://repository.futminna.edu.ng:8080/jspui/handle/123456789/6776
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Imbalance dataset abs.pdf1.26 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.