Abstract:
Natural Language Processing (NLP) concerns with computational processing of natural
languages in order to provide a products as computers interact linguistically with people in ways
that suit people rather than computers. Morphological segmentation is one of the applications of
natural language processing that studies the use of computer programs and software to segment
words to their morphemes. Morphological segmentation is used as components in many
applications, specially machine translation, spell-checker, Part of Speech Tagging (POS) tagging.
Several researchers have applied machine learning approaches for Afaan Oromo morphological
segmentation while no research have used artificial neural networks for morphological
segmentation task.
Artificial neural network is subset of machine learning which inspired by the structure, processing
method and learning ability of a biological brain. The processing of multiple data inputs is done
by different machine learning algorithms. Hence, Neural Networks have the ability to learn by
themselves and produce the output that is not limited to the input provided to them.
Morphological segmentation using neural networks have been developed for languages such as
English. Thus, the main aim of study is to development of a morphological segmentation using
neural networks for Afaan Oromo. In order to achieve the objective of this research work, a corpus
is collected from different sources such as Books, Newspapers of Afaan Oromo and prepared in a
format suitable for use in the development process. We have used corpus of size 50,200, which we
have been developed. From this corpus we have used corpus of size 40,160 for training and 10,040
for testing of our work. From the experiments F-score achieved was 97.48%, 98.33%, 98% using
Bidirectional Long Short Term Memory, Long Short Term Memory, and Recurrent Neural
Networks respectively.
In conclusion, the accuracy of the Afaan Oromo morphological segmentation using neural
networks were promising than baseline experiments. To improve the performance of the model
increase number of training data were recommended for future works