Afaan Oromo News Stance Detection using Machine Learning and  Deep Learning Approach

Fikadu, Tadese; Urgessa, Teklu; Zellalem, Mizanu

Afaan Oromo News Stance Detection using Machine Learning and Deep Learning Approach

Fikadu, Tadese; Urgessa, Teklu; Zellalem, Mizanu

URI: https://repository.ju.edu.et//handle/123456789/7631

Date: 2022-12-12

Abstract:

Nowadays in the age of technology, a tremendous amount of data is being generated online and offline every day. Likewise, so many people today are sticking to the internet using telecommunications equipment to keep up with their public relations and other global disclosures. Meanwhile, misinformation or fake news generators have been working to get more reputation in the mind of their followers by monetizing their content through splashy headlines, and body by sharing virally. Because of that, identifying fake news is an extremely difficult challenge for the news industry and journalists, and the tools for doing so have become critically important. So, to handle this problem stance detection is needed to verify the veracity of the news. Stance Detection is a task to automatically identify whether a particular news headline “Agrees” with, “Disagrees” with, “Discusses,” or is Unrelated to a particular news article. Which is if the news headline is defined to be “unrelated” to the news article, it indicates a high probability of the news being “Fake”. So, in this study, we proposed to utilize the Deep (LSTM and BiLSTM) and Machine (Logistic Regression and Random Forest Classifiers) learning algorithm/model with different feature extraction on Afaan Oromo News Stance Detection. Because, feature extraction is useful to reduce dimensions and get the best feature from the dataset by selecting and combining variables into features, thus, effectively reducing the amount of data. Besides, in this study, we have used a newly collected and annotated, eight hundred and eighty (880) pair of headline and body of Afaan Oromo News Dataset, and around 2M unannotated Afaan Oromo News Domain words are used to develop pre-trained word embedding model. Also, the text preprocessing activities of normalization, tokenization, text cleaning, and stop word removal were just a few of the NLP tasks that were carried out in this work. In our experimentation, we built a FastText embedding model from collected corpus, and it shows high accuracy than one hot encoding. Finally, the result of our Deep and Machine Learning models is compared and BiLSTM has great accuracy with 75% accuracy, 76% precision, and 75% f1-score and Random Forest Classifiers have got 70% accuracy, and 68% precision, and 67% f1-score

Show full item record