Word sequence prediction for Afaan Oromo  using Deep Learning

Ahmed Indris; Admasu. A; Tefery. K

Word sequence prediction for Afaan Oromo using Deep Learning

Ahmed Indris; Admasu. A; Tefery. K

URI: https://repository.ju.edu.et//handle/123456789/8660

Date: 2023-07

Abstract:

Word prediction is one of the most extensively utilized approaches for increasing the communication pace in augmentative and alternative communication. The following word prediction entails guessing the following words. A variety of word-sequence prediction algorithms are available in many languages to help users enter text. Given a sequence of words created from the corpus, the potential is to predict the following word with the highest likelihood of occurrence; thus, it is a predictive modelling problem for languages, also known as Language Modelling. Word sequence prediction benefits physically challenged individuals who have typing issues, increases typing speed by minimizing keystrokes, aids in spelling and error detection, and aids in speech and handwriting recognition. Although Afaan Oromo is one of the most widely spoken and written languages in Ethiopia, no significant research has been undertaken in the field of word sequence prediction. Word sequence prediction is very important for Afaan Oromo because the same vowels and consonants can be typed by pressing the same consonants along with other long and short vowels, combinations of vowels, and special keys. As a result, we designed and implemented deep learning-based network model for Afaan Oromo word sequence prediction. To achieve the objectives, corpus data was collected from different sources and divided 70% of total dataset into training set for training the models and 30% into testing set for testing the designed model. To identify the best performing model for Afaan Oromo word sequence prediction, we conducted total of 6 different experiments using various RNN advanced version in single as well as the hybrids of them, LSTM, BLSTM, GRU, BGRU, BLSTM-GRU and BLSTM-BGRU using collected and preprocessed Afaan Oromo datasets with similar layers and hyperparameters. We evaluated the designed model using accuracy and categorical cross entropy loss function as evaluation metrics. The proposed models were trained and tested with 42,575 Afaan Oromo sentences and we obtained 93.5% for LSTM, 83% for GRU, 97.4% for BLSTM, 80.8% for BGRU, 89.9% for BLSTM-GRU and 88.9% for BLSTM-BGRU respectively. The experimental results prove that the designed stacked BLSTM network model improves over all other conducted experiments and identified and suggested for Afaan Oromo word sequence prediction tasks which yields promising results

Show full item record