Abstract:
Nowadays in the age of technology, a tremendous amount of data is being generated online and
offline every day. Likewise, so many people today are sticking to the internet using
telecommunications equipment to keep up with their public relations and other global
disclosures. Meanwhile, misinformation or fake news generators have been working to get more
reputation in the mind of their followers by monetizing their content through splashy headlines,
and body by sharing virally. Because of that, identifying fake news is an extremely difficult
challenge for the news industry and journalists, and the tools for doing so have become critically
important. So, to handle this problem stance detection is needed to verify the veracity of the
news. Stance Detection is a task to automatically identify whether a particular news headline
“Agrees” with, “Disagrees” with, “Discusses,” or is Unrelated to a particular news article.
Which is if the news headline is defined to be “unrelated” to the news article, it indicates a high
probability of the news being “Fake”. So, in this study, we proposed to utilize the Deep (LSTM
and BiLSTM) and Machine (Logistic Regression and Random Forest Classifiers) learning
algorithm/model with different feature extraction on Afaan Oromo News Stance Detection.
Because, feature extraction is useful to reduce dimensions and get the best feature from the
dataset by selecting and combining variables into features, thus, effectively reducing the amount
of data. Besides, in this study, we have used a newly collected and annotated, eight hundred and
eighty (880) pair of headline and body of Afaan Oromo News Dataset, and around 2M
unannotated Afaan Oromo News Domain words are used to develop pre-trained word
embedding model. Also, the text preprocessing activities of normalization, tokenization, text
cleaning, and stop word removal were just a few of the NLP tasks that were carried out in this
work. In our experimentation, we built a FastText embedding model from collected corpus, and
it shows high accuracy than one hot encoding. Finally, the result of our Deep and Machine
Learning models is compared and BiLSTM has great accuracy with 75% accuracy, 76%
precision, and 75% f1-score and Random Forest Classifiers have got 70% accuracy, and 68%
precision, and 67% f1-score