Offensive And Hate Speech Detection For Amharic Language On Social Media Using Deep Learning Algorithm

Miftah Adem; Kinde Anlay; Fetulhak Abdurahman

dc.contributor.author	Miftah Adem
dc.contributor.author	Kinde Anlay
dc.contributor.author	Fetulhak Abdurahman
dc.date.accessioned	2022-03-11T11:44:11Z
dc.date.available	2022-03-11T11:44:11Z
dc.date.issued	2021-06
dc.identifier.uri	https://repository.ju.edu.et//handle/123456789/6692
dc.description.abstract	At this time, the number of social media users is increasing rapidly worldwide and in Ethiopia. So the use of social media becomes an essential tool for communication, increase tremendously in recent years. But this advancement also opens doors for trolls who poison these social media by their offensive and hate speech toward others. As a solution to this problem, this research proposed offensive and hate speech detection for Amharic text using a deep learning model. An offensive and hate speech data were collected from the Facebook and YouTube public page and manually labeled into hate speech, including their targets. Offensive language and not hate speech classes. The final dataset consists of 10,125 posts and comments. In recent times, Deep learning models such as Convolutional Neural Networks and Recurrent Neural Networks have been applied to offensive and hate speech detection with impressive results. The Convolutional neural networks are good at extract local information but cannot better express context informa tion. Recurrent Neural Networks, on the other hand, can extract context dependencies and have a good classification effect, but training takes a long time. In this research, we used a combined CNNRNN structure to use the strength of both CNN and CNN. The convolution layer will extract local features, and the GRU layer will use the sequence of those features to learn about the input. The feature maps extracted and learned by CNN and GRU are passed to SoftMax and machine learning classifiers such as SVM and RF classifier to generate the final classification. We used word2vec and Fasttext word embeddings with Cbow and skipgram model architecture to represent words as vectors. The Best results obtained from Fasttext (Skipgram) and CNN GRUSVM model with an accuracy of 95.56%, the precision of 95.33%, recall of 95.44%, and F1 a score of 95.37% to classify comments and posts into religious hate speech, ethnic hate speech, offensive language, and not hate speech. However, the models lead to misclassifying offensive language as not hate speech class. Generally, replacing the SoftMax layer with an SVM classifier achieves good performance for offensive and hate speech detection including, the target of hate speech for the Amharic language.	en_US
dc.language.iso	en_US	en_US
dc.subject	Hate speech, Offensive language,word2vec, Fasttext, Deep learning, Amharic text	en_US
dc.title	Offensive And Hate Speech Detection For Amharic Language On Social Media Using Deep Learning Algorithm	en_US
dc.type	Thesis	en_US