Performing Data Augmentation Experiment to Enhance Model Accuracy: A Case Study of BBC News’ Data

Ugwuoke, Uchenna Cosmas; Aminu, Enesi Femi; Ekundayo, Ayobami

Please use this identifier to cite or link to this item: http://repository.futminna.edu.ng:8080/jspui/handle/123456789/18898

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ugwuoke, Uchenna Cosmas	-
dc.contributor.author	Aminu, Enesi Femi	-
dc.contributor.author	Ekundayo, Ayobami	-
dc.date.accessioned	2023-05-12T20:46:41Z	-
dc.date.available	2023-05-12T20:46:41Z	-
dc.date.issued	2022-10	-
dc.identifier.issn	ELSEVIER-SSRN - ISSN-1556-5068	-
dc.identifier.uri	http://repository.futminna.edu.ng:8080/jspui/handle/123456789/18898	-
dc.description	Proceedings of International Conference on Information systems and Emerging Technologies, 2022.	en_US
dc.description.abstract	In natural language processing, text classification forms an essential task to be performed; as such, the use of machine learning algorithms have constantly become indispensable and significance to the research drive. However, the problem of solving text classification with the traditional models gets more challenging because of ambiguities associated with natural languages. A typical example is synonyms’ concept mismatch, and other related issues that accurately attribute text to their related contexts. While a more robust model with an increased number of hidden layers such as LSTM is essential, because of the volume of data involved; exploration of strategies for data augmentation is highly significant. To this end, this research aims to employs semantic lexical database, called WordNet as strategy to augment the BBC news textual data obtained from kaggle repository. This is to pave way for a more efficient news data classification based on the proposed LSTM model. The total BBC news samples are 2,225 data points, and each data point is grouped into five different news categories, which include, technology news, business news, sport news, entertainment news, and political news. Experimental evaluations are carried out using the benchmark BBC news dataset; and the newly augmented dataset within the scope of this study. Consequently, the accuracy of the classification LSTM model for original news dataset and the augmented dataset are 90% and 95% respectively. Therefore, the proposed data augmentation strategy is promising for textual datasets.	en_US
dc.language.iso	en	en_US
dc.publisher	ELSEVIER-SSRN	en_US
dc.relation.ispartofseries	ISSN-1556-5068;	-
dc.subject	Data augmentation	en_US
dc.subject	WordNet	en_US
dc.subject	BBC news data	en_US
dc.subject	LSTM model	en_US
dc.title	Performing Data Augmentation Experiment to Enhance Model Accuracy: A Case Study of BBC News’ Data	en_US
dc.type	Article	en_US
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
BBC.pdf	Performing Data Augmentation Experiment to Enhance Model Accuracy	561.68 kB	Adobe PDF	View/Open

Show simple item record