Abstract:
At this time, the number of social media users is increasing rapidly worldwide and in Ethiopia.
So the use of social media becomes an essential tool for communication, increase tremendously
in recent years. But this advancement also opens doors for trolls who poison these social media
by their offensive and hate speech toward others. As a solution to this problem, this research
proposed offensive and hate speech detection for Amharic text using a deep learning model.
An offensive and hate speech data were collected from the Facebook and YouTube public page
and manually labeled into hate speech, including their targets. Offensive language and not hate
speech classes. The final dataset consists of 10,125 posts and comments. In recent times, Deep
learning models such as Convolutional Neural Networks and Recurrent Neural Networks have
been applied to offensive and hate speech detection with impressive results. The Convolutional
neural networks are good at extract local information but cannot better express context informa tion. Recurrent Neural Networks, on the other hand, can extract context dependencies and have
a good classification effect, but training takes a long time. In this research, we used a combined
CNNRNN structure to use the strength of both CNN and CNN. The convolution layer will
extract local features, and the GRU layer will use the sequence of those features to learn about
the input. The feature maps extracted and learned by CNN and GRU are passed to SoftMax and
machine learning classifiers such as SVM and RF classifier to generate the final classification.
We used word2vec and Fasttext word embeddings with Cbow and skipgram model architecture
to represent words as vectors. The Best results obtained from Fasttext (Skipgram) and CNN GRUSVM model with an accuracy of 95.56%, the precision of 95.33%, recall of 95.44%, and
F1 a score of 95.37% to classify comments and posts into religious hate speech, ethnic hate
speech, offensive language, and not hate speech. However, the models lead to misclassifying
offensive language as not hate speech class. Generally, replacing the SoftMax layer with an
SVM classifier achieves good performance for offensive and hate speech detection including,
the target of hate speech for the Amharic language.