Ge’ez To Amharic Translation With  Neural Network-Based Technique

Ermias Tegenaw; Kris  Calpotura; Ashebir Dereje

Ge’ez To Amharic Translation With Neural Network-Based Technique

Ermias Tegenaw; Kris Calpotura; Ashebir Dereje

URI: https://repository.ju.edu.et//handle/123456789/8128

Date: 2023-03

Abstract:

Ge’ez language was one of the well-known official languages of Ethiopia which was used for writing religious, History, Culture, and Science during ancient times. There are a lot of documents that have been written with Ge’ez and need to be translated into Amharic. Currently, documents are translated from Ge’ez to Amharic by human experts which takes a lot of days and months even years to translate a single document so there should be a new efficient computer-based translation system that can be translated within a short period or minutes. There are different kinds of computer-based language translation like Rule-based, Statistical based, and Neural Network based systems. Neural Network based translation is the most emerging and efficient, dynamic, and fluent. However, much research has been carried out in translating Ge’ez to Amharic the accuracy got is very low. Therefore, further research on different systems must be done to improve and implement them into practical use (deployment). We proposed to use Neural Network for translating Ge’ez to Amharic to get an efficient, fluent, and fast translation. We used Transformer for building a language translation system from Ge’ez to Amharic. Three experiments were conducted the first experiment was for building a pre-trained MLM (masked language model) with a monolingual dataset of 33004 monolingual Ge’ez and Amharic for each of them and the second experiment was conducted by using parallel corpus and supervised-based experiment without using a pre-trained model. The third experiment was done by using pre-trained model with a masked language model as initialization of the encoder and decoder of the transformer and then fine trained with a supervised dataset (bilingual dataset). We used the automatic accuracy measurement tools BLEU (bilingual evaluation understudy) evaluation which is the most common and well known evaluation tool today. In the first experiment we measure the masked language model with perplexity. There are no specific values that exactly show whether it is good or not but the lower value expresses good probability in predicting the word. We trained until the value stops decreasing from initial values. We’ve got 31.65% BLEU and 33.02% BLEU for the second and third experiment respectively. From this perspective, as the BLEU value increases the accuracy also increases. The third experiment had given us a good result with the usage of a pre-trained model for the initialization of the encoder and decoder which in turn improved the accuracy of machine translation. In general, we trained and used pre-trained model by initializing encoder and decoder of transformer model, trained with bilingual dataset and got good result. Increasing dataset from different domain and using new methodology may improve accuracy. The major weakness of the study is unavailability of enough dataset due to lack of Amharic OCR software used for scanning documents to change to editable format.

Show full item record