Jimma University Open access Institutional Repository

Detection And Classification Of Offensive Nuances Of Afaan Oromo Text On Social Media Using Supervised Machine Learning Approach

Show simple item record

dc.contributor.author Dejene Wakuma
dc.contributor.author Getachew Mamo
dc.contributor.author Hailu Beshada
dc.date.accessioned 2022-07-13T12:39:32Z
dc.date.available 2022-07-13T12:39:32Z
dc.date.issued 2022-07
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/7439
dc.description.abstract Users of social media can share and consume information freely. This opportunity leads them to disseminate toxic information which we can call offensive language. In a country like Ethiopia where multi nations and nationalities are living together, sharing an offensive language on social media can negatively affects the welfare of ethnic groups, political party and religious view of the society. Therefore, we aimed to develop an offensive language detection and categorization model for Afaan Oromo text available on social media like Facebook and Twitter pages using supervised machine learning techniques. In order to evaluate the performance of our models, we collected 1051 posts/comments/tweets from Facebook and Twitter pages of different users manually. Lawyer and linguistic experts had been involved for data annotation. In order to have an appropriate version of dataset, all preprocessing task such as tokenization, normalization, stop word removal and special character removal were applied on the data collected from different sources. For classification purpose, five machine learning techniques such as Support Vector Machine (SVM), Multinomial naïve Bayes (MNB), Decision Tree (DT), K-Nearest Neighbors (KNN) and Logistic Regression (LR) have been used. We developed two automatic classification systems, which are offensive language detection system and offensive language categorization system. In a detection of offensive language, the best performing technique was MNB achieved 86% precision, 83% accuracy and 85% of micro averaged F1-score. Similarly, in a categorization of offensive language, the best performing technique was SVM achieved 82% of precision, 56% of accuracy and 61% of micro averaged F1-Score. en_US
dc.language.iso en_US en_US
dc.subject Afaan Oromo, Offensive Language, Machine Learning, Sentiment Analysis en_US
dc.title Detection And Classification Of Offensive Nuances Of Afaan Oromo Text On Social Media Using Supervised Machine Learning Approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account