Jimma University Open access Institutional Repository

Detection and Identification of Harassment and Hate Speech on Social Media Based on Protected Characteristics for the Afaan Oromo Language Using Deep Learning.

Show simple item record

dc.contributor.author G/EYESUS, ASMELASH
dc.contributor.author Sahle, Geletaw
dc.contributor.author Mitiku, Hambisa
dc.date.accessioned 2024-01-15T06:32:26Z
dc.date.available 2024-01-15T06:32:26Z
dc.date.issued 2023-12-12
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/9135
dc.description.abstract Social media today affects a nation's social, political, and economic facets in both positive and negative ways. Positive effects include the facilitation of digital opinion exchanges and the rapid and broad dissemination of information. The spread of hate speech, which includes disparaging individuals based on shared traits like gender (sexism), race, religion, color, disability, and nationality, has a negative effect. Protected characteristics are defined as being against the law to discriminate against someone because of gender (sexism), race, religion, color, disability, or nationality. The use of social media platforms, like Facebook and Twitter, to organize hateful events and spread hate speech has become more common. The unstructured nature of social media data makes manual tracking more challenging. Thus, we are motivated to continue developing the detection of hate speech and harassment identification based on protected characteristics. The study aims to develop a method for harassment and hate speech detection and identification on social media based on protected characteristics of the Afaan Oromo language using deep learning. In this study, we have used an experimental research design approach. Facepager and Google Forms were used for data collection. Normalization, data cleaning, and tokenization were utilized for data preprocessing. We employed two-step approaches for the experimentation. The primary dataset was used for experimentation using the BERT-pretrained model. To examine and identify the best performing deep learning techniques in our dataset, a convolutional neural network (CNN), long short term memory (LSTM), bi-directional long short-term memory (BiLSTM), and gated recurrent unit (GRU) were used and executed. However, overfitting was encountered due to the limited size of our dataset. To address the overfitting issue within the dataset, methods of cross-validation and L2 regularization were employed. To solve the scarcity of the trained data, the second approach, the BERT-pretrained model, was applied. The researcher used the model's accuracy and loss to evaluate the performance of the model. After all the preprocessing activities and training were performed, the performance of each model was: a convolutional neural network (CNN) with an accuracy of 98.44% and a loss of 0.0396 and a bidirectional encoder representation from transformers (BERT) with an accuracy of 98.83% and a loss of 0.0952. Finally, through experimentation, the BERT model outperformed other algorithms with 98.83% accuracy. The study used Afaan Oromo language features to detect harassment and hate speech on social media. Future research could use social media data to create unique word embeddings and assess the CapsNet model's effectiveness on non-textual data. en_US
dc.language.iso en_US en_US
dc.subject BERT en_US
dc.subject Deep learning en_US
dc.subject Harassment en_US
dc.subject Hate speech en_US
dc.subject Protected characteristics en_US
dc.title Detection and Identification of Harassment and Hate Speech on Social Media Based on Protected Characteristics for the Afaan Oromo Language Using Deep Learning. en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account