Detection and Identification of Harassment and Hate Speech on Social Media Based on Protected Characteristics for the Afaan Oromo Language Using Deep Learning.

G/EYESUS, ASMELASH; Sahle, Geletaw; Mitiku, Hambisa

dc.contributor.advisor
dc.contributor.advisor
dc.contributor.author	G/EYESUS, ASMELASH
dc.contributor.author	Sahle, Geletaw
dc.contributor.author	Mitiku, Hambisa
dc.date.accessioned	2024-01-15T06:32:26Z
dc.date.available	2024-01-15T06:32:26Z
dc.date.issued	2023-12-12
dc.identifier.uri	https://repository.ju.edu.et//handle/123456789/9135
dc.description.abstract	Social media today affects a nation's social, political, and economic facets in both positive and negative ways. Positive effects include the facilitation of digital opinion exchanges and the rapid and broad dissemination of information. The spread of hate speech, which includes disparaging individuals based on shared traits like gender (sexism), race, religion, color, disability, and nationality, has a negative effect. Protected characteristics are defined as being against the law to discriminate against someone because of gender (sexism), race, religion, color, disability, or nationality. The use of social media platforms, like Facebook and Twitter, to organize hateful events and spread hate speech has become more common. The unstructured nature of social media data makes manual tracking more challenging. Thus, we are motivated to continue developing the detection of hate speech and harassment identification based on protected characteristics. The study aims to develop a method for harassment and hate speech detection and identification on social media based on protected characteristics of the Afaan Oromo language using deep learning. In this study, we have used an experimental research design approach. Facepager and Google Forms were used for data collection. Normalization, data cleaning, and tokenization were utilized for data preprocessing. We employed two-step approaches for the experimentation. The primary dataset was used for experimentation using the BERT-pretrained model. To examine and identify the best performing deep learning techniques in our dataset, a convolutional neural network (CNN), long short term memory (LSTM), bi-directional long short-term memory (BiLSTM), and gated recurrent unit (GRU) were used and executed. However, overfitting was encountered due to the limited size of our dataset. To address the overfitting issue within the dataset, methods of cross-validation and L2 regularization were employed. To solve the scarcity of the trained data, the second approach, the BERT-pretrained model, was applied. The researcher used the model's accuracy and loss to evaluate the performance of the model. After all the preprocessing activities and training were performed, the performance of each model was: a convolutional neural network (CNN) with an accuracy of 98.44% and a loss of 0.0396 and a bidirectional encoder representation from transformers (BERT) with an accuracy of 98.83% and a loss of 0.0952. Finally, through experimentation, the BERT model outperformed other algorithms with 98.83% accuracy. The study used Afaan Oromo language features to detect harassment and hate speech on social media. Future research could use social media data to create unique word embeddings and assess the CapsNet model's effectiveness on non-textual data.	en_US
dc.language.iso	en_US	en_US
dc.subject	BERT	en_US
dc.subject	Deep learning	en_US
dc.subject	Harassment	en_US
dc.subject	Hate speech	en_US
dc.subject	Protected characteristics	en_US
dc.title	Detection and Identification of Harassment and Hate Speech on Social Media Based on Protected Characteristics for the Afaan Oromo Language Using Deep Learning.	en_US
dc.type	Thesis	en_US