Abstract:
Natural Language Processing (NLP) is a branch of Artificial Intelligence focused on the analysis
and understanding of natural language. One primary application of NLP is next-word prediction.
Which involves predicting the next word in a sentence by presenting a list of the most likely
candidates for that position. Siltigna language is categorized into to Semitic language group which
is spoken in the central Ethiopian regional Government by the Silte peoples. The language is
characterized by unique syntactic and semantic structures and requires specialized models for
effective language processing.
Lack of next-word prediction model leads Siltigna language users to problems like more time
consuming during writing, error-prone, spelling error and also physically disabled persons who have
typing difficulties can ‘t use this language easily to communicate with each other. This study
addresses this problem by proposing an approach to next-word prediction for the Siltigna language,
by applying the power of RNN.
The objective of this study is to investigate the possibility of building a next-word prediction model
for the Siltigna language, using the RNN algorithm. To achieve the objectives, 70,434 sentence of
data was collected from different sources. The corpus divided 80% into a training set for training
the models and 20% into a testing set for testing the designed model.
To get the optimal performing model, we executed 6 distinct experiments employing various
advanced iterations of RNN in both singular instances and their hybrids, namely LSTM, BLSTM,
GRU, BGRU, BLSTM-GRU, and BLSTM-BGRU utilizing collected and preprocessed datasets of
Siltigna with different layers and hyperparameters.
We evaluated the constructed model utilizing accuracy and categorical cross-entropy loss function.
The proposed models were trained and evaluated with Siltigna sentences, and we acquired
performance metrics of 94.4% BGRU, 92.14% BLSTM, 88.5% LSTM, 86.35% GRU, 83.54%
BLSTM-BGRU and 81.89% BLSTM-GRU. The experimental findings substantiate that the
architected BGRU network model surpasses all other conducted experiments and is identified and
recommended for Siltigna next word prediction tasks, which yield encouraging results.