Jimma University Open access Institutional Repository

Afaan Oromo News Text Classification Using A Deep Learning Approach

Show simple item record

dc.contributor.author Lalisa Tadesa
dc.contributor.author Getachew Mamo
dc.contributor.author Amanuel Asseffa
dc.date.accessioned 2022-07-13T11:29:48Z
dc.date.available 2022-07-13T11:29:48Z
dc.date.issued 2022-06
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/7434
dc.description.abstract Today development of the internet has made Afaan Oromo texts easily available and widespread online. Along with the ever-increasing volume of information resources, there is an increasing interest in better solutions for finding, filtering, and organizing these resources and automatic text classification is an invertible solution for it. Text classification (TC) is also called text categorization which means classifying text or document to predefined labels. This study is proposed to utilize the deep learning algorithm with word embedding on Afaan Oromo’s news text classifications. Because feature value extraction in news is difficult, this work provides deep learning algorithm for news text classification. To classify the text data, the earlier approaches used a bag of words to represent the words of the text data, and the information gained from word order, which is an important factor for the classification of news text was not considered. Although the earlier models have a low time complexity, the context and potential semantic relationship of text words are not fully considered, and as the number of the feature and classes increased the accuracy of the models decreased. The objective of this thesis is to apply deep learning approaches to Afaan Oromo news text categorization using CNN, LSTM, and BiLSTM algorithm which is a variant of RNN with word embedding, and to recommend the best for the problem at hand. To develop these models in this study, six thousand one hundred ten (6110) newly collected and annotated news datasets have been used to build the model for the Afaan Oromo language and around 1,731,856 unannotated words are scraped from the Afaan Oromo news domain to develop pre-trained word embedding model. In this work, various natural language processing tasks such as text preprocessing which includes normalization, tokenization, text cleaning, and removal of stop words are performed. For word representation, word2vec word embedding of probability word predictions is selected as it shows great accuracy than the fastText and embedded. Lastly, the result of our models is compared and CNN has great accuracy with 98.4% accuracy, and 98.4% precision and LSTM and BiLSTM have got 95% accuracy, and 94% precision, and 97.28% accuracy, and 97.36% precision respectively. en_US
dc.language.iso en_US en_US
dc.subject text classification, CNN, Deep learning, Afaan Oromo, word embedding en_US
dc.title Afaan Oromo News Text Classification Using A Deep Learning Approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account