Abstract:
The exponential growth of social media such as Twitter and Facebook have revolutionized
communication and content publishing but are also increasingly exploited for the propagation of
hate speech and the organization of hate-based activities. Hate speech is a serious and growing
problem in Ethiopia, both online and offline. It has a big contribution to the growing ethnic
tensions and conflicts in Ethiopia that have created more than 1.4 million new internally displaced
people in the first half of 2018 alone and it is serving as a feul for the continued ethnic crimes in
Ethiopia.
Previous works on Amharic hate speech detection chose to ignore the context in which the social
media somments appeared and the sub-word information that would have improved the detection
of hate speech in social media platforms where the users are careless about the spelling errors of
their comment. This paper, employs a deep recurrent neural networks to capture the context of the
social media comment and FastText word embedding for capturing the sub-word information. The
proposed approach aims at investigating the importance of sub-word and context information for
Amharic hate speech detection in social media platforms. The author treated the post-text, previous
comment, and post metadata information as a context for predicting the hate-ness of a target
comment in social media posts.
Our experiments show that using a feature that can capture sub-word information like FastText
improved the accuracy of Amharic hate speech detection from 81.58% to 84.78% than using the
word2vec feature. Additionally, that incorporating context information improves the accuracy of
hate speech detection system from 81.73% to 85.87% and F-Score from 82.83% to 86.45% than
using just the target comments