Detection And Classification Of Offensive Nuances Of  Afaan Oromo Text On Social Media Using Supervised Machine Learning Approach

Dejene Wakuma; Getachew Mamo; Hailu Beshada

Detection And Classification Of Offensive Nuances Of Afaan Oromo Text On Social Media Using Supervised Machine Learning Approach

Dejene Wakuma; Getachew Mamo; Hailu Beshada

URI: https://repository.ju.edu.et//handle/123456789/7439

Date: 2022-07

Abstract:

Users of social media can share and consume information freely. This opportunity leads them to disseminate toxic information which we can call offensive language. In a country like Ethiopia where multi nations and nationalities are living together, sharing an offensive language on social media can negatively affects the welfare of ethnic groups, political party and religious view of the society. Therefore, we aimed to develop an offensive language detection and categorization model for Afaan Oromo text available on social media like Facebook and Twitter pages using supervised machine learning techniques. In order to evaluate the performance of our models, we collected 1051 posts/comments/tweets from Facebook and Twitter pages of different users manually. Lawyer and linguistic experts had been involved for data annotation. In order to have an appropriate version of dataset, all preprocessing task such as tokenization, normalization, stop word removal and special character removal were applied on the data collected from different sources. For classification purpose, five machine learning techniques such as Support Vector Machine (SVM), Multinomial naïve Bayes (MNB), Decision Tree (DT), K-Nearest Neighbors (KNN) and Logistic Regression (LR) have been used. We developed two automatic classification systems, which are offensive language detection system and offensive language categorization system. In a detection of offensive language, the best performing technique was MNB achieved 86% precision, 83% accuracy and 85% of micro averaged F1-score. Similarly, in a categorization of offensive language, the best performing technique was SVM achieved 82% of precision, 56% of accuracy and 61% of micro averaged F1-Score.

Show full item record