Jimma University Open access Institutional Repository

AFAAN OROMO LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION MODEL

Show simple item record

dc.contributor.author ABDISA, GELANA
dc.contributor.author MAMO, GETACHEW
dc.contributor.author TAKELE, HABTAMU
dc.date.accessioned 2024-07-03T06:46:56Z
dc.date.available 2024-07-03T06:46:56Z
dc.date.issued 2024-06-17
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/9266
dc.description.abstract Automatic Speech recognition(ASR) is a system that translates spoken language into written text. The purpose of this study was to develop an Afaan Oromo speech-to-text recognition model using Speech Transformer, a state-of-the-art deep learning algorithm. Our study primarily aimed at developing a large vocabulary continuous speech recognition model for the Afaan Oromo language with the latest deep learning named Speech Transformer. Most studies in Afaan Oromo ASR used classical machine learning models and only one tried combining Recurrent Neural Networks (RNNs) with Convolutional Neural Networks (CNNs) as a hybrid approach. However, these existing techniques had difficulties in accurately transcribing varied and complex speaking styles found in Afaan Oromo and suffered from slow training due to the lack of parallelization during training time. We have put into consideration these limitations by taking advantage of the powerful non-recurrent sequence to sequence learning capabilities inherent in the architecture of the Speech Transformer. Unlike recurrent-based approaches, the Speech Transformer model can efficiently process all input time series data in parallel, which enables faster training compared to sequential processing methods. This parallelization was incredibly advantageous because even using Google Colab resources to conduct the training, the computational constraints were real.. The speech corpus was prepared by collecting broadcast news audios from various Afaan Oromo media sources, totaling 8729 utterances from 100 speakers (50 males and 50 females) for a dataset of 18.04 hours. We experimented with four different models, varying the number of encoders, decoders, and feed-forward neural networks (FFNN). The best-performing model, with five encoders, three decoders, and 400 FFNN, achieved a word error rate (WER) of 40.2%. While this represents a promising result, we acknowledge that further improvements could be achieved by increasing the dataset size and using high-performance GPUs to enable the construction of larger and more complex models. In future work, we recommend conducting further studies with larger vocabularies and better computational resources to continue advancing the state-of-the-art in Afaan Oromo speech recognition. Additionally, we plan to explore the use of language models to further enhance the accuracy and robustness of the Afaan Oromo ASR system. en_US
dc.language.iso en_US en_US
dc.subject Afaan Oromo en_US
dc.subject Speech Recognition en_US
dc.subject Automatic Speech Recognition en_US
dc.subject SpeechTransformer en_US
dc.title AFAAN OROMO LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION MODEL en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account