Jimma University Open access Institutional Repository

Afaan Oromo Continuous Speech Recognition Using Deep Learning

Show simple item record

dc.contributor.author Degefa, Sifen Dadi
dc.date.accessioned 2022-02-03T06:29:51Z
dc.date.available 2022-02-03T06:29:51Z
dc.date.issued 2021-04-21
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/6163
dc.description.abstract Automatic Speech Recognition (ASR) works by taking an audio speech as an input and convert it to text as an output. In this study an attempt is made to design an automatic Afaan Oromo speech to text recognition using the state-of-the-art deep learning algorithm. Accordingly, the study explored the possibilities of developing a continuous speech recognition system for Afaan Oromo. Previous related works on local languages and also for Afaan Oromo was reviewed but there was no any work on Afaan Oromo using deep learning algorithms; all the previous Afaan Oromo ASRs were based on traditional machine learning models. For this thesis, deep bidirectional RNN and CNN/RNN hybrid models have been proposed to show the possibility of developing ASR for local languages and Afaan Oromo using deep learning and to improve the performance of Afaan Oromo continuous speech recognition systems. For the purpose of conducting the experiment towards training, validating, and visualizing the model, Tensor flow, Keras, Jupyter Notebook, PyDub, Matplotlib and Pydot are tools used. The speech corpus was prepared by collecting broadcast news audios from Ethiopian Broadcasting Corporation (EBC), Oromia Broadcasting Network (OBN), Oromia Media Network (OMN), Voice of America (VOA), Fana Broadcasting Corporation (FBC), and BBC Afaan Oromo program. Totally about 8000 utterances from 101 speakers (80 males and 21 females), which have 10:01:38 hours long data set was collected and transcribed. The dataset was used for both training, validation and testing. We trained and evaluated both RNN and CNN/RNN models with connectionist temporal classification CTC to tackle sequence problems. We also tried to adjust learning rate, optimizers, number of neurons and number of layers of the recognizer model according to the available resources so as to increase the performance of the recognizer. Accordingly, multiple experiments were done and CNN/RNN hybrid model was chosen as the best model for our case. Experimental results shown that, the best performance achieved was 69% WER and 16.3 loss by CNN/RNN hybrid model. Even if we get a promising result, from all experiments we understand that an increase in data and use of high performing GPUs for constructing large models could improve the performance of Afaan Oromo deep ASR. So we recommend further study needs to I be conducted with large vocabulary and better GPU to enhance the accuracy of ASR for Afaan Oromo language en_US
dc.language.iso en_US en_US
dc.subject Afaan Oromo Continuous Speech Recognition en_US
dc.subject Automatic Speech Recognition en_US
dc.subject Broadcasting News Speech en_US
dc.subject Deep Learning en_US
dc.subject CNN en_US
dc.subject RNN en_US
dc.title Afaan Oromo Continuous Speech Recognition Using Deep Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account