Offensive And Hate Speech Detection For Amharic Language On Social Media Using Deep Learning Algorithm

Miftah Adem; Kinde Anlay; Fetulhak Abdurahman

Offensive And Hate Speech Detection For Amharic Language On Social Media Using Deep Learning Algorithm

Miftah Adem; Kinde Anlay; Fetulhak Abdurahman

URI: https://repository.ju.edu.et//handle/123456789/6692

Date: 2021-06

Abstract:

At this time, the number of social media users is increasing rapidly worldwide and in Ethiopia. So the use of social media becomes an essential tool for communication, increase tremendously in recent years. But this advancement also opens doors for trolls who poison these social media by their offensive and hate speech toward others. As a solution to this problem, this research proposed offensive and hate speech detection for Amharic text using a deep learning model. An offensive and hate speech data were collected from the Facebook and YouTube public page and manually labeled into hate speech, including their targets. Offensive language and not hate speech classes. The final dataset consists of 10,125 posts and comments. In recent times, Deep learning models such as Convolutional Neural Networks and Recurrent Neural Networks have been applied to offensive and hate speech detection with impressive results. The Convolutional neural networks are good at extract local information but cannot better express context informa tion. Recurrent Neural Networks, on the other hand, can extract context dependencies and have a good classification effect, but training takes a long time. In this research, we used a combined CNNRNN structure to use the strength of both CNN and CNN. The convolution layer will extract local features, and the GRU layer will use the sequence of those features to learn about the input. The feature maps extracted and learned by CNN and GRU are passed to SoftMax and machine learning classifiers such as SVM and RF classifier to generate the final classification. We used word2vec and Fasttext word embeddings with Cbow and skipgram model architecture to represent words as vectors. The Best results obtained from Fasttext (Skipgram) and CNN GRUSVM model with an accuracy of 95.56%, the precision of 95.33%, recall of 95.44%, and F1 a score of 95.37% to classify comments and posts into religious hate speech, ethnic hate speech, offensive language, and not hate speech. However, the models lead to misclassifying offensive language as not hate speech class. Generally, replacing the SoftMax layer with an SVM classifier achieves good performance for offensive and hate speech detection including, the target of hate speech for the Amharic language.

Show full item record