Abstract:
Word prediction is one of the most extensively utilized approaches for increasing the
communication pace in augmentative and alternative communication. The following word
prediction entails guessing the following words. A variety of word-sequence prediction algorithms
are available in many languages to help users enter text. Given a sequence of words created from
the corpus, the potential is to predict the following word with the highest likelihood of occurrence;
thus, it is a predictive modelling problem for languages, also known as Language Modelling. Word
sequence prediction benefits physically challenged individuals who have typing issues, increases
typing speed by minimizing keystrokes, aids in spelling and error detection, and aids in speech and
handwriting recognition.
Although Afaan Oromo is one of the most widely spoken and written languages in Ethiopia, no
significant research has been undertaken in the field of word sequence prediction. Word sequence
prediction is very important for Afaan Oromo because the same vowels and consonants can be
typed by pressing the same consonants along with other long and short vowels, combinations of
vowels, and special keys. As a result, we designed and implemented deep learning-based network
model for Afaan Oromo word sequence prediction. To achieve the objectives, corpus data was
collected from different sources and divided 70% of total dataset into training set for training the
models and 30% into testing set for testing the designed model. To identify the best performing
model for Afaan Oromo word sequence prediction, we conducted total of 6 different experiments
using various RNN advanced version in single as well as the hybrids of them, LSTM, BLSTM,
GRU, BGRU, BLSTM-GRU and BLSTM-BGRU using collected and preprocessed Afaan Oromo
datasets with similar layers and hyperparameters.
We evaluated the designed model using accuracy and categorical cross entropy loss function as
evaluation metrics. The proposed models were trained and tested with 42,575 Afaan Oromo
sentences and we obtained 93.5% for LSTM, 83% for GRU, 97.4% for BLSTM, 80.8% for BGRU,
89.9% for BLSTM-GRU and 88.9% for BLSTM-BGRU respectively. The experimental results
prove that the designed stacked BLSTM network model improves over all other conducted
experiments and identified and suggested for Afaan Oromo word sequence prediction tasks which
yields promising results